CN110795542B - Dialogue method, related device and equipment - Google Patents
Dialogue method, related device and equipment Download PDFInfo
- Publication number
- CN110795542B CN110795542B CN201910806215.5A CN201910806215A CN110795542B CN 110795542 B CN110795542 B CN 110795542B CN 201910806215 A CN201910806215 A CN 201910806215A CN 110795542 B CN110795542 B CN 110795542B
- Authority
- CN
- China
- Prior art keywords
- question
- questions
- word
- server
- similarity
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 73
- 230000004044 response Effects 0.000 claims abstract description 173
- 239000013598 vector Substances 0.000 claims description 59
- 238000010586 diagram Methods 0.000 claims description 22
- 238000004590 computer program Methods 0.000 claims description 8
- 238000012216 screening Methods 0.000 claims description 6
- 238000012163 sequencing technique Methods 0.000 claims description 3
- 230000008569 process Effects 0.000 abstract description 7
- 238000004891 communication Methods 0.000 description 10
- 238000004364 calculation method Methods 0.000 description 6
- 238000004422 calculation algorithm Methods 0.000 description 5
- 238000000605 extraction Methods 0.000 description 4
- 238000013473 artificial intelligence Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 2
- 244000248349 Citrus limon Species 0.000 description 1
- 235000005979 Citrus limon Nutrition 0.000 description 1
- 241000721047 Danaus plexippus Species 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 239000000470 constituent Substances 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000011524 similarity measure Methods 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3329—Natural language query formulation or dialogue systems
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Theoretical Computer Science (AREA)
- Artificial Intelligence (AREA)
- Human Computer Interaction (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The embodiment of the invention discloses a dialogue method, a related device and equipment, wherein the dialogue method is characterized in that first response information corresponding to a set problem is preset in a question-answer database, after a server receives a dialogue request aiming at the first problem and sent by a first terminal, the question-answer database is preferentially matched with a problem similar to the first problem of the dialogue request, and the first response information corresponding to the matched problem is returned to the first terminal; and when the problem similar to the first problem is not matched, generating response information aiming at the first problem by using the QA system, optimizing the boring dialogue process by the method, and improving the accuracy of the response information.
Description
Technical Field
The embodiment of the application relates to the technical field of artificial intelligence, in particular to a dialogue method, a related device and equipment.
Background
Along with the continuous development of the artificial intelligence field, the interaction frequency of people and intelligent equipment is also gradually increased, the progress of the intelligent equipment is continuously rich in life of people, and the processing and feedback of information are key points of the artificial intelligence technology.
At present, in the field of boring conversation robots, the processing of acquired information mainly depends on a parameterized evaluation system, and according to objective and parameterized analysis and comparison of input information, proper information is matched in a knowledge base, so that a user obtains corresponding feedback. However, in the chat conversation field, it is difficult for the question-answering system (question answering system, QA system) to give high-accuracy reply information to the problem with strong subjectivity through the parameterized evaluation system, so that improving the accuracy of the reply information in the chat conversation field with strong subjectivity is a technical problem to be solved urgently.
Disclosure of Invention
The embodiment of the invention discloses a dialogue method, a related device and equipment, which can solve the technical problem of low accuracy of reply information given by aiming at the problem of strong subjectivity in the prior art, so as to optimize boring dialogue and improve the accuracy of response.
In a first aspect, an embodiment of the present application provides a dialogue method, including:
the method comprises the steps that a server receives a dialogue request sent by a first terminal, wherein the dialogue request is used for requesting response information corresponding to a first problem from the server;
the server searches second questions matched with the first questions in N questions of a question-answer database, wherein the question-answer database comprises N questions and first answer information corresponding to each of the N questions, and N is a positive integer;
under the condition that the second problem is found, the server sends first response information corresponding to the second problem in the question-answer database to the first terminal;
when the second problem is not found, the server generates response information for the first problem through a question and answer system (question answering system, QA system);
and sending the generated response information to the first terminal.
As a possible implementation manner, the server searches the N questions in the question-answer database for a second question matched with the first question, and specifically includes:
determining the maximum similarity in a first similarity set according to the similarity between the first question and each of the N questions, wherein the first similarity set comprises the similarity between the first question and each of the N questions;
if the maximum similarity is greater than a first threshold, determining that the second problem is a problem corresponding to the maximum similarity;
and if the maximum similarity is smaller than the first threshold value, determining that the second problem is not found in the question-answer database.
As a possible implementation manner, the third problem is any one of the N questions, and the method further includes:
and calculating the cosine similarity of the word frequency vector of the first problem and the word frequency vector of the third problem to obtain the similarity of the first problem and the third problem.
As a possible implementation manner, the method further comprises:
the server acquires a first problem set, wherein the first problem set is a set of problems of the historical response of the server;
The server groups the questions in the first question set to obtain L question groups;
the server screens K problem groups with total frequency larger than a second threshold value from the L problem groups according to the total frequency of each problem group in the L problem groups, wherein the total frequency of a first problem group is the sum of the frequencies of the server history response to all problems in the first problem group, and the first problem group is any one problem group in the L problem groups.
As a possible implementation manner, grouping the questions in the first question set to obtain L question groups includes:
calculating the similarity of a fourth problem and each problem in a second problem set, wherein the second problem set is a set of problems which are not currently grouped in the first problem set, and the fourth problem is one problem in the second problem set;
the questions in the second question set, which have the similarity with the fourth question greater than a third threshold, are divided into a question group.
As one possible implementation manner, the fifth problem is any one of the second problem set, and the calculating the similarity between the fourth problem and each problem in the second problem set specifically includes:
Determining a keyword of the fourth problem according to the weight of each word in the fourth problem, wherein the weight of a first word represents the contribution of the first word to the semantics of the fourth problem, and the first word is any word in the fourth problem;
determining a keyword of the fifth problem according to the weight of each word in the fifth problem, wherein the weight of a second word represents the contribution of the second word to the semantics of the fifth problem, and the second word is any word in the fifth problem;
determining word frequency vectors of the fourth problem and the fifth problem according to the keywords of the fourth problem and the keywords of the fifth problem;
and calculating the cosine similarity of the word frequency vector of the fourth problem and the word frequency vector of the fifth problem to obtain the similarity of the fourth problem and the fifth problem.
As a possible implementation manner, the method further comprises:
the server generates at least one response message aiming at a sixth question, wherein any one question in the K question groups of the sixth question;
the server sends the sixth question and the at least one response message to a second terminal, so that the second terminal receives and displays the sixth question and the at least one response message and sends first response messages corresponding to the sixth question to the server, wherein the first response messages corresponding to the sixth question are selected response messages in the at least one response message received by the second terminal or response messages received by the second terminal and input for the sixth question;
The server receives the first response information corresponding to the sixth question and updates the sixth question and the first response information corresponding to the sixth question to the question-answer database.
In a second aspect, an embodiment of the present application further provides a dialogue method, including:
the first terminal generates a dialogue request according to the input first problem, wherein the dialogue request is used for requesting response information corresponding to the first problem from the server;
after receiving the dialogue request, the server searches for a second question matched with the first question in N questions of a question-answer database, and sends first answer information corresponding to the second question in the question-answer database to the first terminal when the second question is found, wherein the question-answer database comprises N questions and first answer information corresponding to each of the N questions, and N is a positive integer;
the first terminal receives and outputs the first response information.
As a possible implementation manner, the method further comprises:
receiving and displaying a sixth problem sent by the server and at least one response message corresponding to the sixth problem;
Receiving first response information corresponding to the sixth question, wherein the first response information corresponding to the sixth question is selected response information in at least one response information corresponding to the sixth question or response information input for the sixth question;
and sending the sixth question and the first response information corresponding to the sixth question to the server so that the server updates the sixth question and the first response information corresponding to the sixth question to the question-answer database.
A third aspect discloses a dialog device, comprising:
a receiving unit, configured to receive a dialogue request sent by a first terminal, where the dialogue request is used to request response information corresponding to a first problem to the server;
the searching unit is used for searching second questions matched with the first questions in N questions of a question-answer database, wherein the question-answer database comprises N questions and first answer information corresponding to each of the N questions, and N is a positive integer;
a sending unit, configured to send, when the second question is found, first answer information corresponding to the second question in the question-answer database to the first terminal;
A first generation unit configured to generate response information for the first question by a question-and-answer system (question answering system, QA system) when the second question is not found;
the sending unit is further configured to send the generated response information to the first terminal.
It should be noted that the apparatus may further comprise other units for performing the dialogue method disclosed in the first aspect or any embodiment of the first aspect.
A fourth aspect discloses a dialog device, comprising:
the generation unit is used for generating a dialogue request according to the input first problem, wherein the dialogue request is used for requesting response information corresponding to the first problem from the server;
a sending unit, configured to send a dialogue request to the server, so that after the server receives the dialogue request, the server searches for a second question that matches the first question from N questions in a question-answer database, where the server sends first answer information corresponding to the second question in the question-answer database to the first terminal if the second question is found, and generates answer information for the first question through a QA system and sends the generated answer information to the first terminal if the second question is not found, where the question-answer database includes N questions and first answer information corresponding to each of the N questions, where N is a positive integer;
And the receiving unit is used for receiving and outputting the received response information.
It should be noted that the apparatus may further include other units for performing the dialogue method disclosed in the second aspect or any embodiment of the second aspect.
A fifth aspect discloses a dialog device comprising a processor and a memory, the processor being connected to the memory, wherein the memory is for storing program code, the processor being for invoking the program code to implement a dialog method as disclosed in the first aspect or any embodiment of the first aspect.
A sixth aspect discloses a dialog device comprising a processor and a memory, the processor being connected to the memory, wherein the memory is for storing program code, the processor being for invoking the program code to implement a dialog method as disclosed in the second aspect or any embodiment of the second aspect.
A seventh aspect discloses a computer readable storage medium storing a computer program or computer instructions which, when executed, implement a dialog method as disclosed in the first aspect or any embodiment of the first aspect.
An eighth aspect discloses a computer readable storage medium storing a computer program or computer instructions which, when executed, implement a dialog method as disclosed in the second aspect or any embodiment of the second aspect.
In the embodiment of the invention, a server receives a dialogue request sent by a first terminal, wherein the dialogue request is used for requesting response information corresponding to a first problem from the server; searching second questions matched with the first questions in N questions of a question-answer database, wherein the question-answer database comprises N questions and first answer information corresponding to each of the N questions, and sending the first answer information corresponding to the second questions in the question-answer database to a first terminal under the condition that the second questions are searched; when the second problem is not found, generating response information for the first problem through the QA system and sending the generated response information to the first terminal. By implementing the embodiment of the invention, the first response information corresponding to the set question is preset in the question-answer database, and after the server receives the dialogue request of the question, the question similar to the first question of the dialogue request can be preferentially matched from the question-answer database, and the first response information corresponding to the matched question is returned to the first terminal, so that the chatting dialogue is optimized, and the accuracy of the response information is improved.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of a system architecture according to an embodiment of the present invention;
FIG. 2 is a schematic illustration of a first server 102 responding to a session request according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a graphical user interface according to an embodiment of the present invention;
FIG. 4 is a schematic flow chart of a dialogue method according to an embodiment of the present invention;
FIG. 5 is a method for calculating the similarity between the first problem and the third problem according to the embodiment of the present invention;
FIG. 6 is a schematic flow chart of updating a question-answer database according to an embodiment of the present invention;
FIG. 7 illustrates a method of grouping a first problem set in accordance with an embodiment of the present invention;
FIG. 8 is a diagram illustrating one method of calculating the similarity of two problems in the grouping process disclosed in an embodiment of the present invention;
FIG. 9a is a schematic diagram of a dialogue device according to an embodiment of the present invention;
FIG. 9b is a schematic diagram of another dialogue device according to an embodiment of the invention;
FIG. 10 is a schematic diagram of a dialogue device according to another embodiment of the present invention;
FIG. 11 is a schematic diagram of a dialogue device according to another embodiment of the present invention;
fig. 12 is a schematic structural diagram of another dialogue device according to an embodiment of the present invention.
Detailed Description
The embodiment of the invention discloses a dialogue method and a dialogue device, which are used for optimizing boring dialogue information. The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. It will be apparent that the described embodiments are only some, but not all, embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
It is noted that the terminology used in the embodiments of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any or all possible combinations of one or more of the associated listed items.
Concepts and terms referred to in the embodiments of the present application will be briefly described first.
(1) Problem(s)
The questions pointed by the embodiment of the application can be questions, presentation sentences, exclamation sentences, imperative sentences, and other sentence patterns. For example, "how does today weather? "; for another example, "you really look good". "; for another example, "real America of sea-! ", not limited herein.
(2) Response information
The answer information in the embodiment of the application may be images, audio, text, or other types. For example, for questions such as "what is eaten can whiten", its response information can be the text "lemon"; for another example, a pure music is recommended for the problem, and the response information of the pure music can be audio data, such as pure music of "wish to take a monarch on a moon-by-moon basis"; as another example, for the question "picture of Disney principals", the response information may be an image, such as one or more pictures of Zhang Dishi Disney principals.
(3) QA system
The question and answer system (Question Answering System, QA system) is a high-level form of information retrieval system, which comprehensively utilizes knowledge representation, information retrieval, natural language processing and other technologies, so that users can put forward information query demands in the form of natural language questions rather than keyword combinations. The QA system may analyze the inputted problem, automatically find response information having the highest possibility from various content resources such as electronic documents, and return the response information.
The question-answering system is not a simple character matching process, and needs to understand questions deeply to recognize the semantics or intention of the questions, and then can search a large-scale knowledge base to find the needed answer information. In addition, in the understanding process of the natural language, the question answering system needs to consider the "scenario" of the question, such as "nearby restaurants", and the understanding process of the question needs to understand that the question is a question of a geographic information category, and meanwhile, needs to acquire geographic location information of the questioner. In some cases, the problem of expression by natural language is the same meaning, but the literal expression is various. For example, "nearby restaurant," "i want to eat," etc., the user's intention to ask is to find nearby restaurants, but the form of expression is quite different.
(4)TF-IDF
TF-IDF (term frequency-inverse document frequency), a common weighting technique used for information retrieval and data mining, belongs to a statistical method, and in the embodiment of the present application, TF-IDF is used to evaluate the importance of a word to a problem. For a question, the importance of a word increases as the frequency with which it appears in the question increases, but at the same time decreases as the frequency with which it appears in a collection of questions increases. For one term in a problem, the TF-IDF of that term can be found by the following formula:
TF-IDF=TF×IDF
Where TF is the Term Frequency (Term Frequency) indicating how frequently the word appears in the question. The IDF is the inverse text frequency index (Inverse Document Frequency), the main idea of the IDF is that if the number of questions comprising a word t in a question set is smaller, the larger the IDF of the word t, the better the class distinction of the word t.
(5) Word frequency vector
The word frequency vector in the embodiment of the application is a word frequency vector of a problem, namely, a problem is converted into a vector form to be expressed. A plurality of words included in a question set, such as a plurality of question combinations, may constitute a word set, at which time a question in the question set may be represented as a vector in a multidimensional space consisting of the word set. For example, the set of questions includes a first question and a third question, wherein:
the first problem is: the box is expensive and that price is appropriate.
The third problem is: this case is not inexpensive and that is more suitable.
The set of questions forms a set of words { this, box, price, expensive, that, appropriate, not, inexpensive, more }.
The term frequency vector of the first question may be represented (1,1,2,1,1,1,0,0,0) if each term in the set of terms has a term frequency of { this 1, box 1, price 2,1 expensive, that 1, proper 1, not 0, inexpensive 0, more 0} at the first question, respectively.
Each word in the set of words has a word frequency of { this 1, box 1, price 1,0, that 1, proper 1, not 1, inexpensive 1, more 1} in the third question, respectively, the word frequency vector of the third question may be represented as (1,1,1,0,1,1,1,1,1).
(6) Cosine similarity
Cosine similarity measures the difference between two individuals by taking the cosine value of the angle between two vectors in the vector space. The closer the cosine value is to 1, the closer the angle is to 0 degrees, i.e., the more similar the two vectors are. According to the method and the device for calculating the similarity of the two problems, the similarity of the two problems can be calculated according to the cosine similarity of the word frequency vectors of the two problems, and the closer the calculation result is to 1, the higher the similarity between the two problems is.
(7)TextRank
The TextRank algorithm is a graph-based ranking algorithm for text. The basic idea is derived from the PageRank algorithm of Google, and key word extraction and abstract can be realized by dividing a text into a plurality of constituent units (words and problems) and establishing a graph model, sequencing important components in the text by utilizing a voting mechanism and only utilizing the information of a single document.
Referring to fig. 1, fig. 1 is a schematic diagram of a dialogue system architecture according to an embodiment of the present invention. As shown in fig. 1, the system architecture diagram may include a first terminal 101, a first server 102, a second server 103, a question-answer database 104, and a second terminal 105. The first terminal 101 may receive a question of the user based on the client, where the question may be a text converted from a voice of the user, a text converted from an image sent by the user, or a text converted from another form, and the first terminal 101 sends a dialogue request including the question to the first server 102. The first terminal 101 may be a robot, a computer, a mobile phone, or an intelligent sound device, or may be other intelligent devices capable of receiving a user input, which is not limited herein. The first terminal 101 is further configured to receive response information corresponding to the first question returned by the first server 102.
The question-answer database 104 includes a plurality of questions and first answer information corresponding to each of the plurality of questions, and the questions in the question-answer database 104 may be subjective questions, objective questions of knowledge class, or other types of questions. The first response information may be text, voice, video, or other forms of information.
The first server 102 is used to provide services to clients. Specifically, the first server 102 may receive a dialogue request sent by the first terminal 101 for searching for a second question matching the first question, and respond to the dialogue request. Referring to fig. 2, fig. 2 is a schematic illustration of a response of the first server 102 to a dialogue request, where the first server 102 may parse the dialogue request to obtain a first question, search response information corresponding to the first question from the question-answer database 104, and if the response information corresponding to the first question is found, consider that the answer is hit, and the first server 102 may return the found response information, such as "answer a" shown in fig. 2, to the first terminal 101; if the answer information corresponding to the question is not found, the first server 102 generates the answer information of the first question through the QA system, and the first server 102 may return the generated answer information to the first terminal 101, such as "answer B" shown in fig. 2.
In some embodiments, the first server 102 is further configured to obtain questions of the historical responses of the first server 102 or other questions in a database, such as questions in a plurality of QQ chat records of users, where the first server 102 may deduplicate the questions of the historical responses to obtain a first question set, further group the questions in the first question set, and filter the question groups based on the frequency of each question group to obtain a high-frequency question group. The first server 102 may also generate at least one response message through the QA system for each question in the high-frequency question group and transmit each question in the high-frequency question group and at least one response message corresponding to each question to the second server 103.
The second server 103 may be a response information calibration platform, where the response information calibration platform may implement manual calibration of response information of received questions, after obtaining each question in the high-frequency question group and response information corresponding to at least one question, the second server 103 may send the obtained one or more questions and at least one response information corresponding to each question to the second terminal 105, and a user of the second terminal 105 may manually calibrate response information corresponding to the received question, that is, obtain first response information of each question, and may send the obtained one or more questions and the first response information corresponding to each question to the second server 103.
As shown in fig. 3, fig. 3 illustrates an exemplary user graphical interface (graphical user interface, GUI), also referred to as a calibration interface, displayed on the second terminal 105. The calibration interface includes a calibration area 301, a calibration progress control 302, an input control 303, a save control 304, a delete control 305, and a export data control 306. The information displayed in the calibration area 301 is a question to be calibrated and at least one response information corresponding to the question, and the question can be modified or the first response information corresponding to the question can be determined according to different input operations. For example, the problem of the calibration interface may be modified by double clicking the problem of the calibration interface, and for example, when the second terminal 105 detects a selected operation for one of the at least one response message, the selected response message is used as a calibration result of the problem, that is, the first response message corresponding to the problem. The second terminal 105 displays information indicating the total number of questions to be calibrated and the number of questions that have been calibrated in the second server 103 through the calibration progress control 302, and the calibration progress information of the calibration progress control 302 is continuously updated according to the calibration operation of the calibration area 301. The second terminal 105 may detect the information input to the input control 303, and when detecting the input information of the input control 303, use the input information as the calibration result of the problem in the calibration area 301 in the calibration interface, that is, the first response information corresponding to the problem. When the second terminal 105 detects the user operation input to the save control 304, the calibration result of the problem in the current calibration interface is saved and updated to the next calibration interface. When delete control 305 detects a user operation entered for that control, it enters the next calibration interface. When the second terminal 105 detects a user operation input to the export data control 306, all questions that have been calibrated and corresponding first response information thereof are exported. User operations may include, but are not limited to, clicking, double clicking, selecting, and the like. The second terminal 105 may send the calibration result of the questions, that is, each question and the first response information corresponding to each question, to the second server 103. The calibration interface for calibrating the first response information corresponding to the problem is not limited to the calibration interface shown in fig. 3, and other designs may be presented on the interface.
After receiving the first answer information corresponding to each question and each question, the second server 103 may send the received first answer information corresponding to each question and each question to the first server 102, and the first server 102 may update the received first answer information corresponding to each question and each question to the question-answer database 104. The second terminal 105 may be a response information calibration terminal, and the second terminal 105 may be a robot, a computer, a mobile phone, an intelligent sound device, or other intelligent devices capable of receiving a user input, which is not limited herein.
In an alternative embodiment, one way of interaction of the second server 103 with the second terminal 105 is as follows:
the second server 103 sends the question and at least one response message corresponding to the question to the second terminal 105, the second terminal 105 displays a calibration interface as shown in fig. 3 based on the client, when the second terminal 105 detects the input operation of the save control 305, the question of the calibration interface and the first response message corresponding to the question are saved, and the second terminal 105 sends the question in the temporary cache and the first response message corresponding to the question to the second server 103 and requests the calibration interface of the next question to the second server 103. The problem of the calibration interface and the first response information corresponding to the problem stored in the temporary buffer may be obtained by detecting the input operation of the calibration area 301 and the input operation of the input control 303. For example, as shown in fig. 3, if the selected response message is detected as "no, i have a large eye with a double water line" in the calibration area 301, and no input message is detected in the input control 303, the messages stored in the temporary buffer are "do you have an eye" and "no, i have a large eye with a double water line". For another example, if the selected answer information is not detected in the calibration area 301, and the input information "i can see you" is detected in the input control 303, the information stored in the temporary buffer is "you have eyes" and "i can see you". For another example, if the calibration area 301 detects that the selected answer information is "zhu not, i am but have a large eyes of two bright spots", and the input control 303 detects that the input information is "i am able to see you", the information stored in the temporary buffer is "i am able to see you" and "i am able to see you".
Alternatively, the first terminal 101 and the second terminal 105 may be the same device.
Alternatively, the first server 102 and the second server 103 may be the same device.
Alternatively, the question-answer database 104 may be located in the first server 102 or may exist separately. Not limited to the system architecture diagram shown in fig. 1, the dialogue system provided in the embodiments of the present application may further include other devices, for example, a third party server, which may be a server for detecting whether the problem is suspected of illegal contents. The first server 102 can provide third party data and third party functions for the dialogue system by interacting with the third party server, so as to further guarantee services provided by the dialogue system platform.
Referring to fig. 4, fig. 4 is a flow chart of a dialogue method according to an embodiment of the invention. As shown in fig. 4, the dialogue method may be implemented by the dialogue system shown in fig. 1, where the first terminal may be the first terminal 101, the first server may be the first server 102, the question-answer database may be the question-answer database 104, and the implementation of the dialogue method may include the following steps.
S101, the first terminal receives a first problem input by a user based on a client.
The client may be an APP on a mobile terminal, a browser on a computer, or other programs capable of providing services for users, which is not limited herein. The first question may be a voice input by the first terminal through a voice sensor (such as a microphone), a text input by the first terminal through an input device (such as a touch panel or a keyboard, etc.), or may have other formats, such as an image, etc., which is not limited herein.
S102, the first terminal generates a dialogue request according to the input first problem.
The dialogue request is used for requesting response information corresponding to the first problem from the first server. When the first question is non-text, the first terminal may convert the first question to text, e.g., by a speech recognition algorithm, the input speech to text; for another example, problems in the input image are identified by a picture recognition algorithm.
S103, the first terminal sends the generated dialogue request to the first server.
S104, the first server receives the dialogue request.
After receiving the dialogue request, the first server may parse the dialogue request and extract the first question from the dialogue request.
S105, the first server searches a second question matched with the first question in the question-answer database.
The first server searches a second question matched with the first question in the question-answer data according to the first question. The question-answer database may include N questions and first answer information corresponding to each of the N questions, where N is a positive integer. When a second problem matched with the first problem is found, executing S106, and sending first response information corresponding to the second problem to the first terminal by the first server; when the second question matching the first question is not found, S107 is performed, and the first server generates response information for the first question through the QA system and transmits the response information to the first terminal.
The first server searching the question-answering data for the second question matched with the first question according to the first question may include the following three implementation manners:
implementation mode (one):
the first server may calculate a similarity between the first problem and each of the N problems, to obtain a first similarity set, where the first similarity set includes a similarity between the first problem and each of the N problems; further, the first server may determine a maximum similarity in the first similarity set, and further, use a problem corresponding to the maximum similarity as a problem matched with the first problem, that is, the second problem, out of the N problems.
Implementation mode (II):
after obtaining the first similarity set, the first server may determine whether the maximum similarity is greater than a first threshold. If the maximum similarity is greater than the first threshold, the first server uses the problem corresponding to the maximum similarity as a problem matched with the first problem, namely, a second problem, out of the N problems, and the first server can execute S106; if the maximum similarity is not greater than the first threshold, it indicates that the second problem is not found, and the first server may execute S107.
Implementation mode (III):
after the first server obtains the first similarity set, the problem corresponding to the similarity with the similarity larger than the preset threshold value can be screened out from the N problems, one or more problems matched with the first problem are obtained, and one of the problems is selected as a second problem matched with the first problem; if the similar problem set is empty, determining that the second problem is not found in the question-answer database.
Optionally, when the first server receives the same first question sent by the first terminal for multiple times, the first server may return first response information corresponding to an optional one question from one or more questions matched with the first question to the first terminal, so that different response information may be given for the same question.
In practical applications, the first threshold or the set threshold may be determined according to the accuracy of the problem that needs to be matched.
In the above three implementation manners, the embodiment of the present application uses the calculation of the similarity between the first question and the third question as an example to describe a calculation manner of the similarity between the first question and any one of the N questions in the question-answering database. Wherein the third problem may be any one of the N problems. Referring to fig. 5, fig. 5 is a method for calculating the similarity between the first problem and the third problem, and the method for implementing the similarity can be performed by the first server 102 shown in fig. 1, and the method for calculating the similarity between the first problem and the third problem includes the following steps.
S1051, performing word segmentation on the first problem and the third problem to obtain a first word set.
The word segmentation processing refers to splitting the first problem and the third problem into one or more independent words, wherein the words can be nouns, verbs, adjectives, and words with any other parts of speech. The first word set refers to a set of words obtained by splitting the first problem and the third problem, wherein the first word set does not comprise repeated words.
For example, the first problem: the box is expensive and that price is appropriate. The first question is word-segmented to get { this, box, price, expensive, that, price, fit }. Third problem: the box is cheaper, that is more appropriate, and the third question is word-segmented to get { this, box, price, not, cheaper, that is more appropriate }. At this point, the first set of words available { this, box, price, expensive, that, appropriate, not, inexpensive, more }.
Each word in the first set of words has a word frequency of { this 1, box 1, price 2,1 expensive, that 1, proper 1, not 0, cheap 0, more 0} at the first question, respectively, the word frequency vector of the first question may be represented as (1,1,2,1,1,1,0,0,0).
Each word in the first set of words has a word frequency of { this 1, box 1, price 1,0, that 1, proper 1, not 1, cheap 1, more 1} in the third question, respectively, the word frequency vector of the third question may be represented as (1,1,1,0,1,1,1,1,1).
S1052, a word frequency vector representing the first question and a word frequency vector representing the third question in a space formed by the first word set.
In one implementation of the embodiment of the present application, the first server may calculate a word frequency of each word in the first word set in the first question, and calculate a word frequency of each word in the first word set in the third question.
Wherein the word frequency of each word in the first word set in the first question refers to the number of times each word in the first word set appears in the first question. For example, each word in the first set of words has a word frequency { this 1, box 1, price 2,1 expensive, that 1, proper 1, not 0, cheap 0, more 0} in the first question, and the word frequency vector of the first question is (1,1,2,1,1,1,0,0,0).
The term frequency of each term in the first set of terms in the third question refers to the number of times each term in the first set of terms occurs in the third question. For example, each word in the first set of words has a word frequency { this 1, box 1, price 1,0, that 1, proper 1, not 1, cheap 1, more 1} in the third question, and the word frequency vector A of the third question is (1,1,1,0,1,1,1,1,1).
S1053, calculating the similarity between the first problem and the third problem.
Wherein the similarity indicates the degree of similarity between the two questions. Alternatively, the similarity between the first problem and the third problem may be calculated by a cosine similarity formula, that is, the cosine similarity between the word frequency vector of the first problem and the word frequency vector of the third problem is calculated, so as to obtain the similarity between the first problem and the third problem. The cosine similarity formula is as follows:
Wherein x is i An ith component, y, of word frequency vectors representing a first problem i The ith component vector of the word frequency vectors representing the third problem. Wherein i is a positive integer, n is a positive integer, 1<=i<N, n is the length of the word frequency vector of the first and third questions of computing similarity. For example, the similarity between the first problem and the third problem is calculated from the word frequency vector of the first problem and the word frequency vector of the third problem, and it is possible to obtain: cosθ=0.81.
Alternatively, the similarity may be obtained by other methods for calculating the similarity, for example, the Jaccard distance and the Dice coefficient, that is, the similarity of two questions may be calculated according to the number of the same words, which is not limited herein.
By the above method, the similarity of the first problem and each of the N problems can be obtained.
S106, the first server sends first response information corresponding to the second question in the question-answer database to the first terminal.
S107, the first server generates response information aiming at the first problem through the QA system.
In some possible implementations, the QA system generating the corresponding answer information for the first question includes question understanding, document retrieval, answer extraction. Firstly, question understanding means that the questions are classified, semantic understanding of the questions is performed, corresponding question conceptual diagrams are generated for the questions, secondly, document searching is performed according to the understanding of the questions, document searching means that content related to the questions is searched in a network knowledge base, and finally answer extraction is performed according to search results, answer extraction means that the conceptual diagrams are clustered and answers are ordered, and answer information corresponding to the questions is obtained.
Alternatively, the manner in which the QA system generates the corresponding response information for the question may be other manners, which are not limited herein.
S108, the first server transmits response information generated by the QA system to the first terminal.
S109, the first terminal receives response information sent by the first server.
The received response information may be first response information corresponding to the second question in the question-answer database, or may be response information for the first question generated by the QA system.
S110, the first terminal outputs the received response information.
The manner in which the first terminal outputs the response information may include, but is not limited to, outputting through a display, outputting through a sound, outputting through a loudspeaker, or outputting through other manners, where the output form of the first terminal is not limited.
It should be noted that, in some embodiments of the present application, steps S107-S108 are not necessary, and S107-S108 may not be included.
In this embodiment of the present application, N questions and first answer information corresponding to the N questions in the question-answer database may be pre-stored in the first server, or may be a database generated by continuously updating the questions and the first answer information corresponding to the questions. One implementation of the question-answer database update or generation is described below.
As shown in fig. 6, fig. 6 is a schematic flow chart of updating or generating a question-answer database according to an embodiment of the present invention. The method for updating the question-answer database can be implemented by the dialogue system shown in fig. 1, the first server can be the first server 102, the second server can be the second server 103, and the first server and the second server can also be the same device; the question-answer database may be the question-answer database 104; the second terminal may be the second terminal 105, or the second terminal may be the same device as the first terminal. As shown in fig. 6, the method may include the following steps.
S601, a first server acquires a first problem set.
In some embodiments, the first server 102 may obtain the questions of the historical responses of the first server 102 or other questions in a database, such as questions in a plurality of user QQ chat records, and the first server 102 may perform a deduplication operation on the questions of the historical responses to obtain the first set of questions.
It should be appreciated that the first set of questions is a set of questions that the first server has historically answered, the first set of questions not including duplicate questions. The problems of historical responses may include duplicate problems.
S602, the first server divides the questions in the first question set into L question groups, wherein L is a positive integer. Wherein S602 may include, but is not limited to, the following implementation, as shown in fig. 7, fig. 7 is a method for grouping a first problem set, and one implementation of S602 may include the following steps:
s6021, calculating the similarity of the fourth problem and each problem in the second problem set.
The second problem set is a set of problems which are not currently grouped in the first problem set, and the fourth problem is one problem in the second problem set.
It should be appreciated that in the process of obtaining the first question set, the fourth question is any one of the first question set, and the second question set is the first question set.
S6022, dividing the questions with the similarity with the fourth questions in the second question set being larger than a third threshold into question groups.
S6023, judging whether the number of the second problem sets is smaller than M.
Where M is a positive integer, such as 1,3,6,2,4, etc. And when the number of the second problem sets is not less than M, continuing to calculate the similarity between the fourth problem and each problem in the second problem sets, and when the number of the second problem sets is less than M, stopping grouping, wherein all obtained groupings are L problem groups. The third threshold in the implementation of S602 is set according to the degree of similarity between the questions, and the higher the requirement for the degree of similarity between the questions is, the closer the set value of the third threshold is to 1.
The method for calculating the similarity of the two questions in the implementation manner of S6021 may include, but is not limited to, an implementation method, as shown in fig. 8, fig. 8 is a method for calculating the similarity of the two questions in the grouping process, which may be implemented in the first server 102 shown in fig. 1, and the method includes some or all of the following steps.
And S60211, screening out a plurality of keywords from the fourth problem and the fifth problem.
In one implementation of the embodiment of the present application, the first server may determine the keyword of the fourth problem according to the weight of each word in the fourth problem, and determine the keyword of the fifth problem according to the weight of each word in the fifth problem. In a specific implementation, the keywords of the fourth problem may be S keywords with the greatest weights among all the words in the fourth problem. The keywords of the fifth question may be S keywords having the greatest weights among all the words in the fifth question. Where S is a positive integer, and the value of S depends on the accuracy of the similarity required for the fourth problem and the fifth problem.
The combination of the keywords of the fourth problem and the keywords of the fifth problem is the plurality of keywords. Wherein the weight of the first word in the fourth question represents the contribution of the first word to the semantics of the fourth question, the first word being any word in the fourth question; the weight of the second word in the fifth question represents the contribution of the second word to the semantics of the fifth question, the second word being any one of the words in the fifth question.
The weight W of the first word in the fourth question may be represented by TF-IDF of the first word R1 in the fourth question q4 R1,q4 。
W R1,q4 =TF R1,q4 ×IDF R1,Q
Wherein TF is R1,q4 Representing the word frequency of the first word R1 in the fourth question q4, indicating the importance of the first word R1 to the fourth question q4 semantics.
IDF R1,Q Representing the reverse document frequency of the first word R1 in the first question set Q, indicating the category discrimination capability of the first word R1 in the first question set Q.
It should be understood that the calculation manner of the weight of the second word in the fifth problem may refer to the related description in the calculation manner of the weight of the first word in the fourth problem, which is not repeated herein.
For example, a fourth problem is: i like to listen to music. The fifth problem is: i do not like writing brush nor pen. And sorting the words in the fourth problem and the fifth problem according to the weight from large to small, and screening the words in the top three ranks in each problem as keywords. For example, the keywords in the fourth question are { I like, music }, and the keywords in the fifth question are { writing brush, pen, don }. At this time, the obtained keywords are { I like, music, writing brush, pen, don't }.
S60212, a word frequency vector of the fourth problem and a word frequency vector of the fifth problem are expressed in space vectors composed of a plurality of keywords.
The word frequency vector of the fourth problem is the vector expression of the fourth problem in the space formed by the plurality of keywords; similarly, the term frequency vector of the fifth problem is a vector expression of the fifth problem in the space composed of the plurality of keywords. The word frequency vector of the fourth problem is obtained according to the frequency of occurrence of each word in the plurality of keywords in the fourth problem, and the word frequency vector of the fifth problem is obtained according to the frequency of occurrence of each word in the plurality of keywords in the fifth problem.
For example, "me" in the keyword set occurs at a frequency of "1" in the fourth problem. Similarly, the frequency of occurrence of other words in the plurality of keywords in the fourth problem can be obtained, so that the word frequency vector of the fourth problem can be obtained as (1,1,1,0,0,0). Similarly, the word frequency vector of the fifth problem is {1,2,0,1,1,2}.
S60213, calculating the cosine similarity of the word frequency vector of the fourth problem and the cosine similarity of the word frequency vector of the fifth problem to obtain the similarity of the fourth problem and the fifth problem.
The calculation of the cosine similarity can be described with reference to the correlation in step S704. And calculating the similarity between the fourth problem and the fifth problem according to the word frequency vector of the fourth problem and the word frequency vector of the fifth problem. For example, the similarity between the fourth problem and the fifth problem is calculated from the word frequency vector of the fourth problem and the word frequency vector of the fifth problem, and it is possible to obtain: cosθ=0.52.
Alternatively, the implementation of calculating the similarity between the two questions in the above implementation may be implemented based on TextRank according to the classification of the key phrase, or other methods for calculating the similarity may be used, which is not limited herein.
S603, the first server screens out K question groups from the L question groups.
The first server screens K question groups with total frequency larger than a second threshold value from the L question groups according to the total frequency of each question group in the L question groups, wherein the total frequency of the first question group is the sum of the frequencies of the first server historical response questions in the first question group, and the first question group is any one of the L question groups. Wherein the total frequency of each question group is the total number of questions in the question group. The second threshold is adjusted based on the number of similar questions in the set of questions to be obtained, e.g., the second threshold is 5,6,4,7, etc., K is a positive integer, and K varies from one second threshold to another.
S604, the first server generates corresponding at least one response message for each question in the K question groups. The at least one response message corresponding to each question may be generated by the QA system, or may be generated by another interactive robot, and is not limited herein.
S605, the first server sends questions in the K question groups and at least one response message corresponding to each question to the second server.
S606, the second server receives questions in the K question groups and at least one response message corresponding to each question.
S607, the second server transmits the sixth problem and at least one response message corresponding to the sixth problem to the second terminal.
The sixth problem is any one of the K problem groups.
And S608, the second terminal receives and displays the sixth problem and at least one calibration interface of response information corresponding to the sixth problem.
In some embodiments of the present application, the second terminal displays the calibration interface shown in fig. 3, and specifically, reference may be made to the related description in fig. 3, which is not repeated herein. Note that, the display of the sixth problem and at least one response message corresponding to the sixth problem by the second terminal is not limited to the display method shown in fig. 3.
S609, the second terminal receives the sixth question and the first response information corresponding to the sixth question.
The first response information corresponding to the sixth question may be selected response information from at least one response information corresponding to the sixth question or response information input for the sixth question, referring to fig. 3.
S610, the second terminal sends a sixth question and first response information corresponding to the sixth question to the second server.
S611, the second server receives the sixth question and the first response information corresponding to the sixth question sent by the second terminal.
S612, the second server sends the sixth question and the first response information corresponding to the sixth question to the first server.
S613, the first server receives the sixth question and the first response information corresponding to the sixth question.
The first server updates the received sixth question and the first response information corresponding to the sixth question to the question-answer database.
It should be noted that, in some embodiments of the present application, the first server and the second server may be the same device, and steps S605, S606, and S612 may not be executed at this time. Alternatively, the second terminal and the first terminal may be the same device.
The following describes an apparatus and device according to an embodiment of the present application.
Fig. 9a is a schematic structural diagram of a dialogue device according to an embodiment of the invention. As shown in fig. 9a, the session device 900a may be applied to the first server in the corresponding embodiment of fig. 4 or fig. 6, and the device 900 may include:
a receiving unit 901, configured to receive a dialogue request sent by a first terminal, where the dialogue request is used to request response information corresponding to a first problem from the server;
A searching unit 902, configured to search for a second question that matches the first question from N questions in a question-answer database, where the question-answer database includes N questions and first answer information corresponding to each of the N questions, and N is a positive integer;
a sending unit 903, configured to send, when the second question is found, first answer information corresponding to the second question in the question-answer database to the first terminal;
a first generating unit 904 for generating response information for the first question by a question-and-answer system (question answering system, QA system) when the second question is not found;
the sending unit 903 is further configured to send the generated response information to the first terminal.
In one implementation of the embodiment of the present application, the searching unit 902 is specifically configured to:
determining the maximum similarity in a first similarity set according to the similarity between the first question and each of the N questions, wherein the first similarity set comprises the similarity between the first question and each of the N questions;
if the maximum similarity is greater than a first threshold, determining that the second problem is a problem corresponding to the maximum similarity;
And if the maximum similarity is smaller than the first threshold value, determining that the second problem is not found in the question-answer database.
A third problem is any one of the N questions, the method further comprising:
and calculating the cosine similarity of the word frequency vector of the first problem and the word frequency vector of the third problem to obtain the similarity of the first problem and the third problem. Fig. 9b is a schematic structural diagram of another dialogue device according to an embodiment of the invention. As shown in fig. 9b, in an implementation of an embodiment of the present application, the apparatus 900b may further include, in addition to the respective units in the foregoing apparatus 900 a:
an acquiring unit 905, configured to acquire a first problem set; the first problem set is a set of problems of the server history response;
a grouping unit 906, configured to group the questions in the first question set to obtain L question groups;
and a screening unit 907, configured to screen K problem groups from the L problem groups according to a total frequency of each problem group in the L problem groups, where the total frequency of the K problem groups is greater than a second threshold, and the total frequency of a first problem group is a sum of frequencies of the server history response to each problem in the first problem group, and the first problem group is any one problem group in the L problem groups.
In one implementation of the embodiment of the present application, the grouping unit 906 is specifically configured to:
calculating the similarity of a fourth problem and each problem in a second problem set, wherein the second problem set is a set of problems which are not currently grouped in the first problem set, and the fourth problem is one problem in the second problem set;
the questions in the second question set, which have the similarity with the fourth question greater than a third threshold, are divided into a question group.
In one implementation of the embodiment of the present application, the fifth problem is any one of the second problem set, and the grouping unit 906 is specifically configured to, when configured to calculate the similarity between the fourth problem and each problem in the second problem set:
determining a keyword of the fourth problem according to the weight of each word in the fourth problem, wherein the weight of a first word represents the contribution of the first word to the semantics of the fourth problem, and the first word is any word in the fourth problem;
determining a keyword of the fifth problem according to the weight of each word in the fifth problem, wherein the weight of a second word represents the contribution of the second word to the semantics of the fifth problem, and the second word is any word in the fifth problem;
Determining word frequency vectors of the fourth problem and the fifth problem according to the keywords of the fourth problem and the keywords of the fifth problem;
and calculating the cosine similarity of the word frequency vector of the fourth problem and the word frequency vector of the fifth problem to obtain the similarity of the fourth problem and the fifth problem.
In one implementation of the embodiment of the present application, the apparatus 900a or the apparatus 900b may further include:
a second generating unit 908, configured to generate at least one response message for a sixth question, where the sixth question is any one question in the K question groups;
the sending unit 903 is further configured to send the sixth question and the at least one response message to a second terminal, so that the second terminal receives and displays the sixth question and the at least one response message, and sends first response messages corresponding to the sixth question to the server, where the first response messages corresponding to the sixth question are selected response messages in the at least one response message received by the second terminal or response messages input for the sixth question are received by the second terminal;
The receiving unit 901 is further configured to receive first answer information corresponding to the sixth question, and update the sixth question and the first answer information corresponding to the sixth question to the question-answer database.
It should be understood that, for a specific functional implementation manner of each functional unit, reference may be made to the related description in the corresponding embodiment of fig. 4 or fig. 6, and no detailed description is given here.
Fig. 10 is a schematic structural diagram of a dialogue device according to an embodiment of the present invention. As shown in fig. 10, the session device 1000 may be applied to the first server in the corresponding embodiment of fig. 4 or fig. 6, and the device 1000 may include:
a generating unit 1001, configured to generate a dialogue request according to an input first question, where the dialogue request is used to request response information corresponding to the first question from a server;
a sending unit 1002, configured to send a dialogue request to the server, so that after the server receives the dialogue request, the server searches for a second question that matches the first question from N questions in a question-answer database, and if the second question is found, the server sends first answer information corresponding to the second question in the question-answer database to the first terminal, and if the second question is not found, the server generates answer information for the first question through a QA system and sends the generated answer information to the first terminal; the question-answer database comprises N questions and first answer information corresponding to each question in the N questions, wherein N is a positive integer;
A receiving unit 1003 for receiving and outputting the received response information.
In one implementation of the embodiment of the present application, the apparatus 1000 further includes: the receiving unit 1003 is further configured to receive and display a sixth question sent by the server and at least one response message corresponding to the sixth question;
the receiving list 1003 is further configured to receive first answer information corresponding to the sixth question, where the first answer information corresponding to the sixth question is selected answer information in the first answer information set or answer information input for the sixth question;
the sending unit 1002 is further configured to send the sixth question and the first response information corresponding to the sixth question to the server, so that the server updates the sixth question and the first response information corresponding to the sixth question to the question-answer database.
It should be understood that, for a specific functional implementation manner of each functional unit, reference may be made to the related description in the corresponding embodiment of fig. 4 or fig. 6, and no detailed description is given here.
Fig. 11 is a schematic diagram of another dialogue device 1100 according to an embodiment of the present invention. The session device 1100 may specifically be the first server 102 in fig. 1, and may include: a processor 1101, a bus 1102, a user interface 1103, a network interface 1104 and a memory 1105. Wherein communication bus 1102 is used to facilitate connection communications among the components. The user interface 1103 may optionally include a display screen, a keyboard. Network interface 1104 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). As shown in fig. 11, an operating system, a network communication module, a user interface module, and a device control application program may be included in the memory 1105, which is a type of computer-readable storage medium, and may be included when the apparatus 1100 is running. In the dialog device 1100 of fig. 11, the network interface 1104 may provide network communication functionality; and the processor 1101 may be configured to invoke the device control application stored in the memory 1105 to implement:
Receiving a dialogue request sent by a first terminal through a network interface 1104, wherein the dialogue request is used for requesting response information corresponding to a first problem from the server;
searching second questions matched with the first questions in N questions of a question-answer database, wherein the question-answer database comprises N questions and first answer information corresponding to each of the N questions, and N is a positive integer;
in the case of finding the second question, sending first response information corresponding to the second question in the question-answer database to the first terminal through a network interface 1104;
when the second question is not found, response information for the first question is generated by a question and answer system (question answering system, QA system) and the generated response information is transmitted to the first terminal through a communication interface 902.
In an implementation of the embodiment of the present application, when executing the third problem is any one of the N problems, the processor 1101 is further configured to execute:
and calculating the cosine similarity of the word frequency vector of the first problem and the word frequency vector of the third problem to obtain the similarity of the first problem and the third problem.
In one implementation of the embodiment of the present application, the processor 1101 is further configured to perform:
acquiring a first problem set, wherein the first problem set is a set of problems of the historical response of the server;
grouping the questions in the first question set to obtain L question groups;
and screening K question groups with total frequencies larger than a second threshold value from the L question groups according to the total frequencies of each question group in the L question groups, wherein the total frequencies of a first question group are the sum of frequencies of the server history response to all questions in the first question group, and the first question group is any one question group in the L question groups.
In an implementation of the embodiment of the present application, at least one response message is generated for a sixth question, where the sixth question is any one of the K question groups, and the processor 1101 is further configured to perform:
transmitting the sixth question and the at least one response message to a second terminal, so that the second terminal receives and displays the sixth question and the at least one response message, and transmits first response messages corresponding to the sixth question to the server, wherein the first response messages corresponding to the sixth question are selected response messages in the at least one response message received by the second terminal or response messages input for the sixth question are received by the second terminal;
First answer information corresponding to the sixth question is received through network interface 1104 and the sixth question and the first answer information corresponding to the sixth question are updated to the question-answer database.
It should be noted that the receiving unit 901, the transmitting unit 903, and the obtaining unit 905 in fig. 9a or 9b may be implemented by the network interface 1104 in fig. 11, and the searching unit 902, the first generating unit 904, the grouping unit 906, the screening unit 907, and the second generating unit 908 in fig. 9a or 9b may be implemented by the processor 1104 in fig. 11.
It should be understood that the dialogue device 1100 described in the embodiments of the present invention may perform the above description of the dialogue method in any of the embodiments corresponding to fig. 4 and 6, and will not be described herein again. In addition, the description of the beneficial effects of the same method is omitted.
Referring to fig. 12, fig. 12 is a schematic structural diagram of another dialogue device 1200 according to an embodiment of the invention. As shown in fig. 12, the session device 1200 may correspond to the first terminal 101 in the embodiment corresponding to fig. 1, and the session device 1200 may include: the processor 1201, the network interface 1204 and the memory 1205, and the session device 1022 may further include: a user interface 1203, and at least one communication bus 1202. Wherein a communication bus 1202 is used to enable connected communications between these components. The user interface 1203 may include a Display screen (Display) and a Keyboard (Keyboard), and the user interface 1203 may also include a standard wired interface and a wireless interface. The network interface 1204 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1204 may be a high-speed RAM memory or a non-volatile memory (non-volatile memory), such as at least one disk memory. The memory 1205 may also optionally be at least one storage device located remotely from the processor 1201. As shown in fig. 12, an operating system, a network communication module, a user interface module, and a device control application program may be included in a memory 1201, which is one type of computer-readable storage medium.
In the dialog device 1200 shown in fig. 12, the network interface 1204 may provide network communication functions; while user interface 1203 is primarily an interface for providing input to a user; and processor 1201 may be configured to invoke a device control application stored in memory 1205 to perform:
generating a dialogue request according to the input first problem, wherein the dialogue request is used for requesting response information corresponding to the first problem from the server;
a dialogue request is sent to a server, so that after the server receives the dialogue request, the server searches second questions matched with the first questions in N questions of a question-answer database, and sends first answer information corresponding to the second questions in the question-answer database to the first terminal when the second questions are found, wherein the question-answer database comprises the N questions and the first answer information corresponding to each of the N questions, and N is a positive integer;
and receiving and outputting the first response information.
Note that, the receiving unit 1003 and the transmitting unit 1002 in fig. 10 may be implemented by the network interface 1104 in fig. 12, and the generating unit 1001 in fig. 10 may be implemented by the processor 1104 in fig. 12.
It should be understood that the dialogue device 1200 described in the embodiments of the present invention may perform the above description of the dialogue method in any of the embodiments corresponding to fig. 4 and fig. 6, and will not be described herein again. In addition, the description of the beneficial effects of the same method is omitted.
Furthermore, it should be noted here that: the embodiment of the present invention further provides a computer storage medium, in which a computer program executed by the aforementioned session device 900a, 900b or 1100 is stored, and the computer program includes program instructions, when executed by the processor, can execute the method executed by the first server in the corresponding embodiment of fig. 4 or fig. 6, which will not be described herein again.
Furthermore, it should be noted here that: the embodiment of the present invention further provides a computer storage medium, where a computer program executed by the aforementioned session device 1000 or 1200 is stored, and the computer program includes program instructions, when executed by the processor, can execute the method executed by the first terminal in the embodiment corresponding to fig. 4 or fig. 6, which will not be described herein.
In addition, the description of the beneficial effects of the same method is omitted. For technical details not disclosed in the embodiments of the computer storage medium according to the present invention, please refer to the description of the method embodiments of the present invention.
Those skilled in the art will appreciate that implementing all or part of the above-described methods in accordance with the embodiments may be accomplished by way of a computer program stored on a computer readable storage medium, which when executed may comprise the steps of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), or the like.
The foregoing disclosure is illustrative of the present invention and is not to be construed as limiting the scope of the invention, which is defined by the appended claims.
Claims (10)
1. A method of dialogue, applied to a server, comprising:
receiving a dialogue request sent by a first terminal, wherein the dialogue request is used for requesting response information corresponding to a first problem from the server;
Searching second questions matched with the first questions in N questions of a question-answer database, wherein the question-answer database comprises N questions and first answer information corresponding to each of the N questions, and N is a positive integer;
under the condition that the second problem is found, first response information corresponding to the second problem in the question-answer database is sent to the first terminal;
when the second problem is not found, generating response information for the first problem through a question and answer system, wherein the response information comprises the following steps: classifying the first questions, carrying out semantic understanding, generating corresponding question conceptual diagrams, carrying out document retrieval on the first questions in a network knowledge base, clustering the question conceptual diagrams according to retrieval results, and sequencing answers to obtain answer information of the first questions; and transmitting the generated response information to the first terminal.
2. The method according to claim 1, wherein searching for a second question matching the first question among the N questions in the question-and-answer database specifically comprises:
determining the maximum similarity in a first similarity set according to the similarity between the first question and each of the N questions, wherein the first similarity set comprises the similarity between the first question and each of the N questions;
If the maximum similarity is greater than a first threshold, determining that the second problem is a problem corresponding to the maximum similarity;
and if the maximum similarity is smaller than the first threshold value, determining that the second problem is not found in the question-answer database.
3. The method according to claim 1 or 2, wherein a third problem is any one of the N problems, the method further comprising:
and calculating cosine similarity of the word frequency vector of the first problem and the word frequency vector of the third problem to obtain similarity of the first problem and the third problem.
4. The method of claim 1 or 2, further comprising:
acquiring a first problem set, wherein the first problem set is a set of problems of the historical response of the server;
grouping the questions in the first question set to obtain L question groups;
and screening K question groups with total frequencies larger than a second threshold value from the L question groups according to the total frequencies of each question group in the L question groups, wherein the total frequencies of a first question group are the sum of frequencies of the server history response to all questions in the first question group, and the first question group is any one question group in the L question groups.
5. The method of claim 4, the grouping the questions in the first question set to obtain L question groups, comprising:
calculating the similarity of a fourth problem and each problem in a second problem set, wherein the second problem set is a set of problems which are not currently grouped in the first problem set, and the fourth problem is one problem in the second problem set;
the questions in the second question set, which have the similarity with the fourth question greater than a third threshold, are divided into a question group.
6. The method of claim 5, wherein the fifth problem is any problem in the second problem set, and the calculating the similarity between the fourth problem and each problem in the second problem set specifically includes:
determining a keyword of the fourth problem according to the weight of each word in the fourth problem, wherein the weight of a first word represents the contribution of the first word to the semantics of the fourth problem, and the first word is any word in the fourth problem;
determining a keyword of the fifth problem according to the weight of each word in the fifth problem, wherein the weight of a second word represents the contribution of the second word to the semantics of the fifth problem, and the second word is any word in the fifth problem;
Determining word frequency vectors of the fourth problem and the fifth problem according to the keywords of the fourth problem and the keywords of the fifth problem;
and calculating the cosine similarity of the word frequency vector of the fourth problem and the word frequency vector of the fifth problem to obtain the similarity of the fourth problem and the fifth problem.
7. The method according to claim 4, wherein the method further comprises:
generating at least one response message for a sixth question, wherein the sixth question is any question in the K question groups;
transmitting the sixth question and the at least one response message to a second terminal, so that the second terminal receives and displays the sixth question and the at least one response message, and transmits first response messages corresponding to the sixth question to the server, wherein the first response messages corresponding to the sixth question are selected response messages in the at least one response message received by the second terminal or response messages received by the second terminal and input for the sixth question;
and receiving first response information corresponding to the sixth question, and updating the sixth question and the first response information corresponding to the sixth question to the question-answer database.
8. A dialog device for use with a server, comprising:
a receiving unit, configured to receive a dialogue request sent by a first terminal, where the dialogue request is used to request response information corresponding to a first problem;
the searching unit is used for searching second questions matched with the first questions in N questions of a question-answer database, wherein the question-answer database comprises N questions and first answer information corresponding to each of the N questions, and N is a positive integer;
the sending unit is used for sending first response information corresponding to the second problem in the question-answer database to the first terminal under the condition that the searching unit searches the second problem;
the first generating unit is configured to generate, when the second problem is not found, response information for the first problem through a question-answering system, and includes: classifying the first questions, carrying out semantic understanding, generating corresponding question conceptual diagrams, carrying out document retrieval on the first questions in a network knowledge base, clustering the question conceptual diagrams according to retrieval results, and sequencing answers to obtain answer information of the first questions;
The sending unit is further configured to send the generated response information to the first terminal.
9. A dialog device comprising a processor and a memory, the processor being connected to the memory, wherein the memory is for storing program code, the processor being for invoking the program code to implement the method of any of claims 1-7.
10. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein a computer program or computer instructions, which, when executed, implement the method of any of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910806215.5A CN110795542B (en) | 2019-08-28 | 2019-08-28 | Dialogue method, related device and equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910806215.5A CN110795542B (en) | 2019-08-28 | 2019-08-28 | Dialogue method, related device and equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110795542A CN110795542A (en) | 2020-02-14 |
CN110795542B true CN110795542B (en) | 2024-03-15 |
Family
ID=69427065
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910806215.5A Active CN110795542B (en) | 2019-08-28 | 2019-08-28 | Dialogue method, related device and equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110795542B (en) |
Families Citing this family (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111339282A (en) * | 2020-03-27 | 2020-06-26 | 中国建设银行股份有限公司 | Intelligent online response method and intelligent customer service system |
CN111537968B (en) * | 2020-05-12 | 2022-03-01 | 江铃汽车股份有限公司 | Angle radar calibration method and system |
CN111694941B (en) * | 2020-05-22 | 2024-01-05 | 腾讯科技(深圳)有限公司 | Reply information determining method and device, storage medium and electronic equipment |
CN112632239A (en) * | 2020-12-11 | 2021-04-09 | 南京三眼精灵信息技术有限公司 | Brain-like question-answering system based on artificial intelligence technology |
CN112818225A (en) * | 2021-01-27 | 2021-05-18 | 上海明略人工智能(集团)有限公司 | Display method and device of pushed data |
CN112800209A (en) * | 2021-01-28 | 2021-05-14 | 上海明略人工智能(集团)有限公司 | Conversation corpus recommendation method and device, storage medium and electronic equipment |
CN113283238B (en) * | 2021-05-19 | 2023-12-22 | 上海明略人工智能(集团)有限公司 | Text data processing method and device, electronic equipment and storage medium |
CN113360626B (en) * | 2021-07-02 | 2022-02-11 | 北京容联七陌科技有限公司 | Multi-scene mixed question-answer recommendation method for intelligent customer service robot |
CN116860951B (en) * | 2023-09-04 | 2023-11-14 | 贵州中昂科技有限公司 | Information consultation service management method and management system based on artificial intelligence |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101217515A (en) * | 2008-01-03 | 2008-07-09 | 腾讯科技(深圳)有限公司 | A system and method based on question sorting and push |
CN102789496A (en) * | 2012-07-13 | 2012-11-21 | 携程计算机技术(上海)有限公司 | Method and system for implementing intelligent response |
CN104216913A (en) * | 2013-06-04 | 2014-12-17 | Sap欧洲公司 | Problem answering frame |
CN108170792A (en) * | 2017-12-27 | 2018-06-15 | 北京百度网讯科技有限公司 | Question and answer bootstrap technique, device and computer equipment based on artificial intelligence |
CN108491433A (en) * | 2018-02-09 | 2018-09-04 | 平安科技(深圳)有限公司 | Chat answer method, electronic device and storage medium |
-
2019
- 2019-08-28 CN CN201910806215.5A patent/CN110795542B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101217515A (en) * | 2008-01-03 | 2008-07-09 | 腾讯科技(深圳)有限公司 | A system and method based on question sorting and push |
CN102789496A (en) * | 2012-07-13 | 2012-11-21 | 携程计算机技术(上海)有限公司 | Method and system for implementing intelligent response |
CN104216913A (en) * | 2013-06-04 | 2014-12-17 | Sap欧洲公司 | Problem answering frame |
CN108170792A (en) * | 2017-12-27 | 2018-06-15 | 北京百度网讯科技有限公司 | Question and answer bootstrap technique, device and computer equipment based on artificial intelligence |
CN108491433A (en) * | 2018-02-09 | 2018-09-04 | 平安科技(深圳)有限公司 | Chat answer method, electronic device and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN110795542A (en) | 2020-02-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110795542B (en) | Dialogue method, related device and equipment | |
CN107832286B (en) | Intelligent interaction method, equipment and storage medium | |
CN107609101B (en) | Intelligent interaction method, equipment and storage medium | |
TWI732271B (en) | Human-machine dialog method, device, electronic apparatus and computer readable medium | |
CN107797984B (en) | Intelligent interaction method, equipment and storage medium | |
CN110209897B (en) | Intelligent dialogue method, device, storage medium and equipment | |
CN109033156B (en) | Information processing method and device and terminal | |
WO2021169347A1 (en) | Method and device for extracting text keywords | |
JP2019536119A (en) | User interest identification method, apparatus, and computer-readable storage medium | |
US11977567B2 (en) | Method of retrieving query, electronic device and medium | |
CN111694941B (en) | Reply information determining method and device, storage medium and electronic equipment | |
US11043215B2 (en) | Method and system for generating textual representation of user spoken utterance | |
WO2023040516A1 (en) | Event integration method and apparatus, and electronic device, computer-readable storage medium and computer program product | |
CN117591639A (en) | Question answering method, device, equipment and medium | |
CN111881283A (en) | Business keyword library creating method, intelligent chat guiding method and device | |
CN112052297A (en) | Information generation method and device, electronic equipment and computer readable medium | |
CN112749558A (en) | Target content acquisition method and device, computer equipment and storage medium | |
KR20190011176A (en) | Search method and apparatus using property language | |
KR20200085688A (en) | Information providing method and apparatus using reserved word | |
CN112330387A (en) | Virtual broker applied to house-watching software | |
CN116775980B (en) | Cross-modal searching method and related equipment | |
US20220108071A1 (en) | Information processing device, information processing system, and non-transitory computer readable medium | |
JP2023162940A (en) | server | |
CN111046151B (en) | Message processing method and device | |
CN113763929A (en) | Voice evaluation method and device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
REG | Reference to a national code |
Ref country code: HK Ref legal event code: DE Ref document number: 40020199 Country of ref document: HK |
|
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |