CN107239574B

CN107239574B - A kind of intelligent Answer System knowledge-matched method and device of problem

Info

Publication number: CN107239574B
Application number: CN201710513108.4A
Authority: CN
Inventors: 陈飞; 崔培君; 乔思龙; 王萌萌
Original assignee: Beijing Shenzhou Taiyue Software Co Ltd
Current assignee: Dingfu Intelligent Technology Co., Ltd
Priority date: 2017-06-29
Filing date: 2017-06-29
Publication date: 2018-11-02
Anticipated expiration: 2037-06-29
Also published as: CN107239574A

Abstract

This application discloses a kind of knowledge of intelligent Answer System-problem matching process and devices,Two kinds of similarity evaluation systems of weight similarity and vector similarity have been merged in the matching process,Compensate for systematic error existing for single similarity evaluation method,And,The scheme of the application is before calculating weight similarity and vector similarity,Word segmentation result is pre-processed,Remove the stop words in word segmentation result,It reduces and accidentally touches rate,In addition,Normalized has been carried out to the weight of the knowledge word obtained after pretreatment,Make its threshold value [0,1],Reduce the weight similarity calculation deviation caused by different knowledge word weight difference are big,So that the weight similarity of problem and alternative knowledge is more accurate,And then improve the accuracy of total similarity,Further improve intelligent Answer System knowledge-matched accuracy of problem.

Description

A kind of intelligent Answer System knowledge-matched method and device of problem

Technical field

This application involves natural language processing technique fields more particularly to a kind of intelligent Answer System knowledge to be matched with problem Method and device.

Background technology

Intelligent Answer System is a kind of system for inquiring problem answers by human-computer interaction customer self-service, generally includes network The client and server of connection presets knowledge base and answer corresponding with knowledge base library, knowing in knowledge base in server Know and corresponded with the answer in answer library, server matches preset knowledge base according to text the problem of acquisition from client In knowledge, then the corresponding answer of the knowledge is returned into the problem of client is to answer client.

Matching problem text and preset knowledge usually there are two types of method, the first be based on user input the text of problem with Preset knowledge in knowledge base is identical, is for second the similarity of text and preset knowledge in knowledge base that problem is inputted based on user Highest.For first way, user propose the problem of it is often not exactly the same with the problems in database, for example, building in advance Include four knowledge in vertical knowledge base：1. credit card handle flow, 2. credit card logout flow paths, 3. mass transit cards handle flow and 4. mass transit card logout flow path, when client inputs " credit card handles flow ", intelligent Answer System can then match knowledge 1, work as visitor When family inputs " how credit card is handled ", intelligent Answer System then can not successful match.It is common similar for the second way Spending computational methods, there are systematic deviations, and it is not maximum value often to go wrong with the similarity of corresponding knowledge, may be led Cause problem and knowledge matching error, and then the case where cause to give an irrelevant answer, in above-mentioned example, when client inputs " credit How card is handled " when, intelligent Answer System thinks the problem and 3 similarity highest, and provides 3 for client and corresponding answer Case, that is, the accuracy of this method is poor.

It would therefore be highly desirable to developing one kind being used for intelligent Answer System, it being capable of accurate match correlation according to the fuzzy enquirement of user The method and device of knowledge.

Invention content

This application provides a kind of intelligent Answer System knowledge-matched method of problem and devices, are asked with solving intelligence It answers system problem and matches inaccuracy with knowledge, the problem for causing the answer accuracy rate of extraction low.

The purpose of the present invention is to provide the following aspects：

In a first aspect, this application provides a kind of intelligent Answer System knowledge-matched method of problem, this method includes：

Obtain the problem of client is sent；

Obtain the weight similarity of the alternative knowledge of each and described problem respectively using knowledge word and problem word；

Obtain the vector similarity of alternative knowledge and described problem described in each respectively using knowledge word and problem word；

Using the weight similarity and the vector similarity, the alternative knowledge of each and described problem are calculated separately Total similarity；

The alternative knowledge that total similarity meets preset rules is obtained, as the knowledge to match with described problem.

Optionally, further include before the weight similarity for obtaining the alternative knowledge of each and described problem respectively：

Knowledge base is generated, includes at least one alternative knowledge in the knowledge base；

Knowledge pre-processes, and carries out word segmentation processing to the alternative knowledge, removes the stop words in word segmentation processing result, to Obtain the knowledge word in the alternative knowledge.

Optionally, the knowledge word is prepared by the following：

Word segmentation processing is carried out to the alternative knowledge；

Remove the stop words in word segmentation processing result, to obtain the knowledge word in the alternative knowledge；

Optionally, described problem word is prepared by the following：

Word segmentation processing is carried out to described problem；

Remove the stop words in word segmentation processing result, to obtain the problems in described problem word.

Optionally, the power for obtaining alternative knowledge and described problem described in each respectively using knowledge word and problem word Similarity includes again：

Obtain the weight of knowledge word in alternative knowledge；

Rule, which is assigned, according to preset weight assigns weight to problem word in problem；

Utilize weight similarity described in the weight of knowledge word and the weight calculation of problem word.

Optionally, the weight for obtaining knowledge word in alternative knowledge, including：

The weight of each knowledge word is obtained, the weight of the knowledge word is weight of the knowledge word in this knowledge；

The weight of each knowledge word is normalized.

Optionally, the weight assigns rule and assigns the condition for presetting weight to judge whether described problem word meets, if Meet, then gives described problem word to assign and preset weight；

If not meeting, the weight of problem word is that all knowledge words identical with problem word are standby in each in described problem Select the average value of weight in knowledge；

Described assign presets the condition of weight not include described problem word in the knowledge word.

Optionally, the vectorial phase that alternative knowledge and described problem described in each are obtained using knowledge word and problem word Like degree, including：

The vector of the alternative knowledge is obtained,

The vector of described problem is obtained,

The vector similarity is calculated using the vector of the vector sum described problem of the alternative knowledge.

Optionally, the vector for obtaining the alternative knowledge includes：

The term vector of knowledge word is obtained, the term vector of the knowledge word is word of the knowledge word in the alternative knowledge Vector；

The vector of the alternative knowledge is calculated using the term vector of the knowledge word.

Optionally, the vector for obtaining described problem includes：

The term vector of acquisition problem word, the term vector of described problem word are identical as the identical term vector of knowledge word；

The vector of described problem is calculated using the term vector of knowledge word.

Optionally, the weight similarity is using a kind of in Jaccard (Jacobi's distance), Hamming distance and editing distance Or the mode of a variety of combinations obtains；

The vector similarity is obtained using cosine manner；

Total similarity of the alternative knowledge and described problem is the weight of same alternative knowledge and described problem The sum of the linear weighted function of similarity and the vector similarity.

Optionally, the preset rules are by total sequencing of similarity of all alternative knowledge and described problem, total phase It is selected like spending in the maximum.

The application has merged weight similarity and vector similarity in knowledge-problem matching process of intelligent Answer System Two kinds of similarity evaluation systems compensate for systematic error existing for single similarity evaluation method, moreover, the scheme of the application exists Before calculating weight similarity and vector similarity, word segmentation result is pre-processed, removes the stop words in word segmentation result, Reduce and accidentally touch rate, in addition, the weight to the knowledge word obtained after pretreatment has carried out normalized, make its threshold value [0, 1], reduce the weight similarity calculation deviation caused by different knowledge word weight difference are big so that problem with it is standby It selects the weight similarity of knowledge more accurate, and then improves the accuracy of total similarity, further improve intelligent answer system The system matched accuracy of knowledge-problem.

Second aspect, present invention also provides a kind of knowledge of intelligent Answer System-problem coalignment, described device packets It includes：

Problem acquiring unit, for obtaining the problem of client is sent；

Weight similarity acquiring unit, for utilize knowledge word and problem word obtain respectively the alternative knowledge of each with it is described The weight similarity of problem；

Vector similarity acquiring unit, for utilize knowledge word and problem word obtain respectively alternative knowledge described in each with The vector similarity of described problem；

Total similarity calculated calculates separately each for utilizing the weight similarity and the vector similarity Total similarity of the alternative knowledge of item and described problem；

Knowledge-problem matching unit meets the alternative knowledge of preset rules for obtaining total similarity, is asked as with described Inscribe the knowledge to match.

Optionally, the knowledge word is prepared by the following：

Word segmentation processing is carried out to the alternative knowledge；

Optionally, described problem word is prepared by the following：

Word segmentation processing is carried out to described problem；

Optionally, the weight similarity acquiring unit includes：

Knowledge word Weight Acquisition subelement, the weight for obtaining knowledge word in alternative knowledge；

Problem word weight assigns subelement, and power is assigned to problem word in problem for assigning rule according to preset weight Weight；

Weight similarity calculation subelement, for utilizing weight phase described in weight calculation of the weight of knowledge word with problem word Like degree.

Optionally, the knowledge word Weight Acquisition subelement includes：

Common Weight Acquisition is from unit, the weight for obtaining each knowledge word, and the weight of the knowledge word is known to be described Know weight of the word in this knowledge；

Normalized is normalized from unit for the weight to each knowledge word.

Optionally, described problem word power assigns in subelement, and the weight assigns rule whether to judge described problem word Meet the condition for assigning and presetting weight, if meeting, is assigned to described problem word and preset weight；If not meeting, described problem The weight of middle problem word is the average value of all knowledge word identical with problem word weights in the alternative knowledge of each；

Optionally, the vector similarity acquiring unit includes：

Knowledge vector obtains subelement, the vector for obtaining the alternative knowledge；

Problem vector obtains subelement, the vector for obtaining described problem；

Vector similarity computation subunit, the vector for the vector sum described problem using the alternative knowledge calculate institute State vector similarity.

Optionally, the knowledge vector acquisition subelement includes：

The term vector of knowledge word is obtained from unit, and the term vector of the term vector for obtaining knowledge word, the knowledge word is Term vector of the knowledge word in the alternative knowledge；

Knowledge vector is calculated from unit, the vector for calculating the alternative knowledge using the term vector of the knowledge word.

Optionally, described problem vector acquisition subelement includes：

The term vector of problem word is obtained from unit, the term vector for obtaining problem word, the term vector of described problem word with The term vector of the identical knowledge word is identical；

Problem vector is calculated from unit, the vector for calculating described problem using the term vector of knowledge word.

The vector similarity is obtained by the way of cosine；

Total similarity of the alternative knowledge and described problem is the weight of same alternative knowledge and described problem The sum of the linear weighted function of similarity and the vector similarity；

The preset rules are by total sequencing of similarity of all alternative knowledge and described problem, and total similarity is maximum It is selected in person.

Description of the drawings

In order to illustrate more clearly of the technical solution of the application, letter will be made to attached drawing needed in the embodiment below Singly introduce, it should be apparent that, for those of ordinary skills, without having to pay creative labor, Other drawings may also be obtained based on these drawings.

Fig. 1 is the flow chart of one embodiment of the application intelligent Answer System knowledge-problem matching process；

Fig. 2 is the flow chart that the application S102 obtains weight similarity one embodiment；

Fig. 3 is the flow chart of one embodiment that the application S103 obtains vector similarity；

Fig. 4 is the apparatus structure schematic diagram of the application intelligent Answer System knowledge-problem coalignment one embodiment；

Fig. 5 is the structural schematic diagram of 402 one embodiment of the application weight similarity acquiring unit；

Fig. 6 is the apparatus structure schematic diagram of 403 one embodiment of the application vector similarity acquiring unit；

Fig. 7 is the structural schematic diagram of computer system provided by the embodiments of the present application.

Specific implementation mode

Present invention will now be described in detail, and the features and advantages of the invention will become more with these explanations It is clear, clear.

The present invention described below.

According to the application's in a first aspect, providing a kind of knowledge of intelligent Answer System-problem matching process, such as Fig. 1 It is shown, wherein this method includes：

S101 obtains the problem of client is sent；

S102 obtains the weight similarity of the alternative knowledge of each and described problem using knowledge word and problem word respectively；

In this application, knowledge word is the word segmentation result of alternative knowledge；Problem word is the participle for the problem of client is sent As a result.

It is similar to the vector of described problem that S103 utilizes knowledge word and problem word to obtain alternative knowledge described in each respectively Degree；

S104 utilizes the weight similarity and the vector similarity, calculates separately the alternative knowledge of each and is asked with described Total similarity of topic；

S105 obtains the alternative knowledge that total similarity meets preset rules, as the knowledge to match with described problem.

In this application, the knowledge word is prepared by the following：

Word segmentation processing is carried out to the alternative knowledge；

Remove the stop words in word segmentation processing result, to obtain the knowledge word in the alternative knowledge.

In this application, described problem word is prepared by the following：

Word segmentation processing is carried out to described problem；

In this application, the alternative knowledge includes that a standard is asked and asked with optional extension, wherein the extension ask for The different expression-forms that standard is asked, express same semanteme, such as by banking on how to being said for handling credit card Bright, the related alternative knowledge with " how handling credit card " is stored in the knowledge base includes：" credit card handles flow ", " where I can handle credit card ", " handling credit card step " etc., using one of above problem problem as standard It asks, other three problems are asked as extension.In the present embodiment, for example a problem " credit card of foremost will can be come Handle flow " it is asked as standard, it asks three other problems as corresponding extension, can also specify in other embodiments Other problems are asked for standard.

It should be noted that the standard asks the form asked with the extension and semantic formula both may be used, also may be used In the form of using specific question sentence, all in protection scope of the present invention.

In this application, it is based on every alternative knowledge, the knowledge word is the result after duplicate removal, that is, in same In the word segmentation processing result of alternative knowledge, identical entry only according to an entry meter, such as：Alternative knowledge is that " I wants to handle Credit card, how I handle China Merchants Bank's credit card ", the result after word segmentation processing be respectively " I ", " handling ", " credit card ", " I ", " how ", " handling ", " China Merchants Bank ", " credit card ", that is, although including two identical character strings in the alternative knowledge of this The result of the word segmentation processing of the alternative knowledge of " I ", " handling ", " credit card ", but this is " I ", " handling ", " credit card ", " such as What " and " China Merchants Bank ".

It is to pre-establish a deactivated word list to remove stop words processing, every in word segmentation processing result when removing stop words One entry is matched with the word in deactivated word list, if the entry is present in deactivated word list, by the entry from It is deleted in Chinese word segmentation treated entry string.

Stop words described herein is the word of no practical significance, such as auxiliary words of mood or structural auxiliary word, such as " ", " ", " ", " ", " " etc..

The inventors discovered that after removing stop words, the noise in described problem can be removed, to make the alternative knowledge It is more accurate with the similarity of described problem, to improve the matched accuracy rate of knowledge-problem, and then improve intelligent Answer System The accuracy rate of answer.

In this application, step S111 and step S112 can also be after step slol before step S101.

In the application S101, it is the text for obtaining client and sending problem to obtain the problem of client is sent.

In this application, can be arbitrary as shown in Figure 1, being not specially limited to the sequence of step S102 and step S103 Sequentially, i.e., the sequence of described two steps can be S102-S103, or S103-S102.

In this application, the weight similarity refer to knowledge based word with the knowledge of the weight calculation of problem word with The similarity of described problem.

In the application S102, as shown in Fig. 2, described obtain the alternative knowledge of each respectively using knowledge word and problem word Weight similarity with described problem includes：

S201 obtains the weight of knowledge word in alternative knowledge；

S202 assigns rule according to preset weight and assigns weight to problem word in problem；

S203 utilizes the weight of knowledge word and weight similarity described in the weight calculation of problem word.

In the application S201, the weight for obtaining knowledge word in alternative knowledge includes：

S2011 obtains the weight of each knowledge word, and the weight of the knowledge word is the knowledge word in this knowledge Weight；

The weight of each knowledge word is normalized in S2012.

In the application S2011, the weight of each knowledge word passes through tf-idf (term frequency-inverse Document frequency, term frequency-inverse document rate) method acquisition.

In the application S2012, normalized is passed through to the weight of each knowledge word, it is [0,1] to make its threshold value.

The inventors discovered that it is big by the weighted value range that tf-idf methods obtain, such as [Isosorbide-5-Nitrae 000] can be reached, such as The weight of fruit each knowledge word in terms of the weight, then the knowledge word that weight can be caused small are about 0 when calculating similarity, to Cause the weight similarity being calculated and true similarity difference huge, that is, the serious misalignment of weight similarity, and then lead to intelligence The answer accuracy rate that energy question answering system provides is low.

The application passes through normalized to the weight of each knowledge word, and it is [0,1] to make its threshold value, has both maintained each The characteristic distributions of a knowledge word weight also reduce the gap of each knowledge word weight so that alternative knowledge and described problem Similarity calculation is more reasonable accurate.

In the application S202, the weight assigns rule and presets weight to judge whether described problem word meets to assign Condition assigns to described problem word if meeting and presets weight；If not meeting, the weight of problem word is institute in described problem There is the average value of knowledge word identical with problem word weight in the alternative knowledge of each.

In this application, described assign presets the condition of weight not include described problem word in the knowledge word.

In this application, all default weights for meeting the problem of assigning the condition for presetting weight word are all identical, such as obtain The problem of taking is " how I handle credit card ", word segmentation processing and remove the result after credit word be " I ", " how ", " do Reason ", " credit card ", in knowledge word only " how ", " handling ", " credit card ", that is, do not include problem word " I " in knowledge word, Therefore, assign problem word " I " default weight (such as 0.2).

In the application S2023, word segmentation processing is carried out to different alternative knowledge, identical knowledge word is likely to be obtained, not With in alternative knowledge, the weight possibility of above-mentioned identical knowledge word is identical may also be different, when they are different, choose any one A weight assign problem word be all it is unilateral, it is inaccurate, and take average weight of the above-mentioned knowledge word in all alternative knowledge It is then of universal significance, enables to knowledge weight and problem weight more accurate.

In the application one embodiment, alternative knowledge is " credit card handles flow ", " credit card logout flow path ", " public Hand over card to handle flow " and " mass transit card logout flow path ", then knowledge word average value of weight in the alternative knowledge of each obtain as follows It takes：

Wherein, each knowledge word average value of weight in the alternative knowledge of each is respectively：

In this application, further include before the weight of problem word in S202 acquisition described problems：

S221 carries out word segmentation processing to described problem；

S222 removes the stop words in word segmentation processing result, to obtain the problems in described problem word.

In this application, word segmentation processing mode is carried out to described problem and carries out the side of word segmentation processing to the alternative knowledge Formula is identical.

In this application, remove the mode of stop words in described problem word segmentation result and remove the alternative knowledge point with described The mode of stop words is identical in word result.

In the application S203, weight similarity is logical described in weight calculation of the weight using knowledge word with problem word One or more kinds of modes combined in Jaccard (Jacobi's distance), Hamming distance and editing distance are crossed to obtain.

In the application one embodiment, it is that " credit card handles flow " is with described problem to calculate the alternative knowledge The method of the weight similarity of " how I handle credit card " is：

Set A is set as described problem word and its set of weight, set B is the set of knowledge word and its weight, then collects Close A be " I ", " how ", " handling " and " credit card ", weight is respectively 1/5,1/3,1/3 and 1/3, set B be " credit Card ", " handling " and " flow ", weight is respectively 1/3,1/3 and 1/3, the weight similarity of described problem and the alternative knowledge For：

Jaccard (A, B)=| A intersect B |/| A union B |

Wherein, Jaccard (A, B) indicates the weight similarity of set A and B；

| A intersect B | indicate that A, B two gathers the sum of the weight of intersection；

| A union B | indicate that A, B two gathers the sum of the weight of union；

For the present embodiment, two intersection of sets collection of A, B is " handling " and " credit card ", and weight is respectively 1/3 and 1/3, A, Two union of sets collection of B be " I ", " how ", " handling ", " credit card " and " flow ", weight is respectively 1/5,1/3,1/3,1/3 With 1/3；

Then Jaccard (A, B)=(1/3+1/3)/(1/5+1/3+1/3+1/3+1/3)=(2/3)/(5/7)=10/21, That is, the weight similarity of the alternative knowledge and described problem is 10/21.

In the application S103, as shown in figure 3, described obtained described in each alternatively respectively using knowledge word and problem word The vector similarity of knowledge and described problem includes：

S301 obtains the vector of the alternative knowledge,

S302 obtains the vector of described problem,

S303 calculates the vector similarity using the vector of the vector sum described problem of the alternative knowledge.

In this application, the vector similarity knowledge that refer to knowledge based word calculate with the vector of problem word with The similarity of described problem.

In the application S301, the vector for obtaining the alternative knowledge, including：

S3011 obtains the term vector of knowledge word, and the term vector of the knowledge word is the knowledge word in the alternative knowledge In term vector；

S3012 calculates the vector of the alternative knowledge using the term vector of the knowledge word.

In this application, the dimension of described problem word term vector is identical as the dimension of knowledge word term vector.

The term vector of identical knowledge word is identical in the knowledge base, and the term vector of the knowledge word passes through word2vec A kind of mode in (word2vector, i.e. word are embedded in) or one-hot (one-hot encoding, i.e. one-hot coding) obtains It takes.

In this application, the vector of the alternative knowledge is the average term vector of all knowledge words in described problem, that is, institute There is knowledge word to be averaged obtained vector on every dimension, such as alternative knowledge is " credit card handles flow ", point The result of word processing is " credit card ", " handling " and " flow ", their vector expression is respectively：

Credit card vector indicates [8/10,1/10,1/10]

It handles vector and indicates [3/10,6/10,1/10]

Flow vector indicates [4/10,2/10,4/10]

Then the vector of the alternative knowledge is [(8/10+3/10+4/10)/3, (1/10+6/10+2/10)/3, (1/10+1/10 + 4/10)/3]=[1/2,3/10,1/5].

In the application S302, the vector for obtaining described problem, including：

S3021 obtains the term vector of problem word, the term vector of the term vector of described problem word and the identical knowledge word It is identical；

S3022 calculates the vector of described problem using the term vector of knowledge word.

In this application, the method phase of the vector for obtaining described problem and the vector for obtaining the alternative knowledge Together.

In the application S303, the vector of the vector sum described problem using the alternative knowledge calculates the vector Similarity, the vector similarity are obtained by the way of cosine.

In one embodiment of the application, in a cosine manner for illustrate the acquisition of vector similarity：For example, calculating standby The vector similarity of knowledge " credit card handles flow " and described problem " how I handle credit card " is selected, knowledge word is set and asks The term vector of epigraph is three-dimensional, and the alternative knowledge is carried out word segmentation processing and removes stop words, obtaining result is：

Credit card vector indicates [8/10,1/10,1/10]

It handles vector and indicates [3/10,6/10,1/10]

Flow vector indicates [4/10,2/10,4/10]

Then the vector of the alternative knowledge be expressed as with A [(8/10+3/10+4/10)/3, (1/10+6/10+2/10)/3, (1/10+1/10+4/10)/3]=[1/2,3/10,1/5], that is, A=[1/2,3/10,1/5]；

Described problem is subjected to word segmentation processing and removes stop words, obtaining result is：

Then the vector of problem be expressed as with B [(1/10+5/10+3/10+8/10)/4, (3/10+4/10+6/10+1/10)/ 4, (6/10+1/10+1/10+1/10)/4]=[17/40,14/40,9/40], i.e. B=[17/40,14/40,9/40]；

Then the vector similarity of the alternative knowledge and described problem is that the cosine of the angle theta of A and B is equal to：

Specifically, [1/2,3/10,1/5] A=, B=[17/40,14/40,9/40], then：

The similarity of A and B

Cos θ=

[(17/40*1/2)+(14/40*3/10)+(9/40*1/5)]/sqrt[(17/40*17/40+14/40*14/40+ 9/40*9/40)*(1/ 2*1/2+3/10*3/10+1/5*1/5)]

=98/100,

That is, the vector similarity of the alternative knowledge and described problem is 98/100.

It is described to utilize the weight similarity and the vector similarity in the application S104, calculate separately each Total similarity of alternative knowledge and described problem, to seek the weight similarity of current alternative knowledge and described problem and described The sum of linear weighted function of vector similarity, that is, assign weight the first predetermined coefficient of similarity respectively, vector similarity second is default Coefficient calculates separately the product of the product and vector similarity and the second predetermined coefficient of weight similarity and the first predetermined coefficient, Total similarity is two sum of products.

In the application S104, total similarity is calculated according to following formula I：

D_Always=a*D_Weight+b*D_VectorFormula I

Wherein, D_AlwaysIndicate total similarity,

D_WeightIndicate weight similarity,

D_VectorIndicate vector similarity,

A indicates the first predetermined coefficient,

B indicates the second predetermined coefficient,

Also, 0<a<1, a+b=1.

In the application S105, the preset rules are to arrange total similarity of all alternative knowledge and described problem Sequence is selected in total similarity the maximum.

To be more fully understood by the method that intelligent Answer System of the present invention extracts knowledge, it is set forth below one specifically Embodiment illustrates.

Described problem is " how I handle credit card ", and alternative knowledge is " credit card handles flow ", " credit card nullifies stream Journey ", " mass transit card handles flow " and " mass transit card logout flow path ", then the alternative knowledge be with the matched process of described problem：

(1) word segmentation processing is carried out to the alternative knowledge, removes stop words therein, obtain knowledge word, recycle tf- Idf calculates weight of each knowledge word in this alternative knowledge, as a result as follows：

(2) according to (1) as a result, calculating each knowledge word average value of weight in the alternative knowledge of each and being respectively：

(3) word segmentation processing is carried out to described problem, removes stop words therein, problem word is obtained, according to preset weight It assigns rule and assigns weight to described problem word, in the present embodiment, preset weight is 1/5, and weight knot is assigned to problem word Fruit is as follows：

Then, the weight similarity of alternative knowledge " credit card handles flow " and described problem " how I handle credit card " For：Jaccard (A, B)=| A intersect B |/| A union B |=(1/3+1/3)/(1/5+1/3+1/3+1/3+1/3) =(2/3)/(5/7)=10/21, that is, the weight similarity of the alternative knowledge and described problem is 10/21；Remaining is alternatively known Know and calculated successively in the same manner with the weight similarity of described problem, it is as shown in table 1 below to obtain result；

(4) in the present embodiment, the term vector for setting each knowledge word and problem word is three-dimensional, uses word2vec The term vector for obtaining each knowledge word and problem word calculates described alternative by taking alternative knowledge " credit card handles flow " as an example The vector similarity of knowledge and described problem, it is as follows：

The knowledge term vector of the alternative knowledge is followed successively by：

Credit card vector indicates [8/10,1/10,1/10]

It handles vector and indicates [3/10,6/10,1/10]

Flow vector indicates [4/10,2/10,4/10],

Specifically, [1/2,3/10,1/5] A=, B=[17/40,14/40,9/40], then：

The similarity of A and B

Cos θ=[(17/40*1/2)+(14/40*3/10)+(9/40*1/5)]/sqrt [(17/40*17/40+14/40* 14/40+9/40*9/4 0)*(1/2*1/2+3/10*3/10+1/5*1/5)]

=98/100,

That is, the vector similarity of the alternative knowledge and described problem is 98/100, remaining alternative knowledge and described problem Vector similarity calculate successively in the same manner, it is as shown in table 1 below to obtain result；

(5) calculate separately same alternative knowledge and described problem weight similarity and vector similarity linear weighted function it With, wherein the first predetermined coefficient is a=0.4, and the second predetermined coefficient is b=0.6, and the results are shown in Table 1.

The similarity result of table 1 alternative knowledge and problem

(6) " total similarity " in comparison sheet 1, the maximum alternative knowledge of numerical value is " credit card handles flow ", that is, matching knot Fruit is " credit card handles flow ".

According to the second aspect of the application, a kind of knowledge of intelligent answer-problem coalignment is additionally provided, such as Fig. 4 institutes Show, described device includes：

Problem acquiring unit 401, for obtaining the problem of client is sent；

Weight similarity acquiring unit 402, for utilize knowledge word and problem word obtain respectively the alternative knowledge of each with The weight similarity of described problem；

Vector similarity acquiring unit 403 is alternatively known for being obtained respectively using knowledge word and problem word described in each Know the vector similarity with described problem；

Total similarity calculated 404 calculates separately every for utilizing the weight similarity and the vector similarity Total similarity of one alternative knowledge and described problem；

Knowledge-problem matching unit 405 meets the alternative knowledge of preset rules for obtaining total similarity, as with institute The knowledge that the problem of stating matches.

In this application, the knowledge word is prepared by the following：

Word segmentation processing is carried out to the alternative knowledge；

In this application, described problem word is prepared by the following：

Word segmentation processing is carried out to described problem；

In this application, as shown in figure 5, the weight similarity acquiring unit 402 includes：

Knowledge word Weight Acquisition subelement 4021, the weight for obtaining knowledge word in alternative knowledge；

Problem word weight assigns subelement 4022, is assigned to problem word in problem for assigning rule according to preset weight Weight；

Weight similarity calculation subelement 4023, for being weighed described in weight calculation of the weight with problem word using knowledge word Weight similarity.

In this application, the knowledge word Weight Acquisition subelement 4021 includes：

For common Weight Acquisition from unit 40211, the weight of the weight for obtaining each knowledge word, the knowledge word is institute State weight of the knowledge word in this knowledge；

Normalized is normalized from unit 40212 for the weight to each knowledge word.

In this application, described problem word, which is weighed, assigns in subelement 4022, and the weight assigns rule to be asked described in judgement Whether epigraph meets the condition for assigning and presetting weight, if meeting, is assigned to described problem word and presets weight；If not meeting, The weight of problem word is that all knowledge words identical with problem word weight in the alternative knowledge of each is averaged in described problem Value；

In this application, as shown in fig. 6, the vector similarity acquiring unit 403 includes：

Knowledge vector obtains subelement 4031, the vector for obtaining the alternative knowledge；

Problem vector obtains subelement 4032, the vector for obtaining described problem；

Vector similarity computation subunit 4033, for the vector sum described problem using the alternative knowledge to gauge Calculate the vector similarity.

In this application, the knowledge vector acquisition subelement 4031 includes：

The term vector of knowledge word is obtained from unit 40311, the term vector for obtaining knowledge word, the word of the knowledge word to Amount is term vector of the knowledge word in the alternative knowledge；

Knowledge vector is calculated from unit 40312, for calculating the alternative knowledge using the term vector of the knowledge word Vector.

In this application, described problem vector acquisition subelement 4032 includes：

The term vector of problem word is obtained from unit 40321, the term vector for obtaining problem word, the word of described problem word to Amount is identical as the identical term vector of knowledge word；

Problem vector is calculated from unit 40322, the vector for calculating described problem using the term vector of knowledge word.

In this application, the weight similarity is using in Jaccard (Jacobi's distance), Hamming distance and editing distance The mode that one or more combine obtains；

The vector similarity is obtained by the way of cosine；

Total similarity of the alternative knowledge and described problem is the weight of same alternative knowledge and described problem The sum of similarity and the vector similarity；

Fig. 7 shows the block diagram for the computer system 800 that can implement embodiment on it.Computer system 800 is wrapped Include processor 810, storage medium 820, system storage 830, monitor 840, keyboard 850, mouse 860, network interface 870 With video adapter 880.These components are coupled by system bus 890.

Storage medium 820 (such as hard disk) stores multiple programs, including operating system, application program and other program moulds Block.It is, for example, keyboard that user can input order and information, input equipment by input equipment into computer system 800 850, touch tablet (not shown) and mouse 860.Come display text and graphical information using monitor 840.

Operating system is on processor 810 and for coordinating and providing in the personal computer system 800 in Fig. 7 Various parts control.Furthermore, it is possible to using computer program to implement above-mentioned various implementations in computer system 800 Example.

It would be recognized that hardware component shown in fig. 7 is only for illustrative purposes, and physical unit may be according to being real It applies the present invention and the computing device of deployment and changes.

In addition, computer system 800 for example can be desktop computer, server computer, laptop computer or nothing Line equipment, such as mobile phone, personal digital assistant (PDA), handheld computer etc..

It would be recognized that the embodiment in the scope of the invention can be embodied as to the form of computer program product, computer Program product includes computer executable instructions, such as program code, can be run in conjunction with any of appropriate operating system In appropriate computing environment, operating system is, for example, Microsoft Windows, Linux or UNIX operating system.The scope of the invention Interior embodiment can also include program product, and program product includes that computer-readable medium can for carrying or storing computer Execute instruction or data structure thereon.Such computer-readable medium can be it is any can be by general or specialized calculating The usable medium that machine accesses.For example, such computer-readable medium may include RAM, ROM, EPROM, EEPROM, CD- ROM, magnetic disk storage or other storage devices, or can be used in carrying or storing desired with form of computer-executable instructions Program code and any other medium that can be accessed by general or specialized computer.

The method, apparatus and system of answer are extracted according to intelligent Answer System provided by the invention, are had beneficial below Effect：

(1) normalized has been carried out to weight, it is more accurate as evaluation factor thereby using weight；

(2) two kinds of factors of comprehensive weight and vector judge the similarity of problem and knowledge so that similarity judges subject to more Really；

(3) work for reducing artificial correction problem saves a large amount of cost of labor of enterprise.

It is described the invention in detail above in association with detailed description and exemplary example, but these explanations are simultaneously It is not considered as limiting the invention.It will be appreciated by those skilled in the art that without departing from the spirit and scope of the invention, Can be with various equivalent substitutions, modifications or improvements are made to the technical scheme of the invention and its embodiments, these each fall within the present invention In the range of.Scope of protection of the present invention is subject to the appended claims.

Claims

1. a kind of knowledge of intelligent Answer System-problem matching process, which is characterized in that this method includes：

Obtain the problem of client is sent；

Using the weight similarity and the vector similarity, total phase of the alternative knowledge of each and described problem is calculated separately Like degree；

The alternative knowledge that total similarity meets preset rules is obtained, as the knowledge to match with described problem；

The weight similarity packet for obtaining alternative knowledge and described problem described in each respectively using knowledge word and problem word It includes：

Obtain the weight of knowledge word in alternative knowledge；

Utilize weight similarity described in the weight of knowledge word and the weight calculation of problem word；

The weight for obtaining knowledge word in alternative knowledge, including：

The weight of each knowledge word is normalized；

Wherein, total similarity is calculated according to following formula I：

D_Always=a*D_Weight+b*D_VectorFormula I

Wherein, D_AlwaysIndicate total similarity,

D_WeightIndicate weight similarity,

D_VectorIndicate vector similarity,

A indicates the first predetermined coefficient,

B indicates the second predetermined coefficient,

Also, 0<a<1, a+b=1.

2. according to the method described in claim 1, it is characterized in that,

The knowledge word is prepared by the following：

Word segmentation processing is carried out to the alternative knowledge；

Described problem word is prepared by the following：

Word segmentation processing is carried out to described problem；

3. method according to claim 1 or 2, which is characterized in that described to be obtained respectively often using knowledge word and problem word The vector similarity of one the alternative knowledge and described problem, including：

Obtain the vector of the alternative knowledge；

Obtain the vector of described problem；

4. a kind of knowledge of intelligent Answer System-problem coalignment, which is characterized in that the device includes：

Problem acquiring unit, for obtaining the problem of client is sent；

Weight similarity acquiring unit, for utilizing knowledge word and problem word to obtain the alternative knowledge of each and described problem respectively Weight similarity；

Vector similarity acquiring unit, for utilize knowledge word and problem word obtain respectively alternative knowledge described in each with it is described The vector similarity of problem；

It is standby to calculate separately each for utilizing the weight similarity and the vector similarity for total similarity calculated Select total similarity of knowledge and described problem；

Knowledge-problem matching unit meets the alternative knowledge of preset rules for obtaining total similarity, as with described problem phase Matched knowledge；

The weight similarity acquiring unit includes：

Problem word weight assigns subelement, and weight is assigned to problem word in problem for assigning rule according to preset weight；

Weight similarity calculation subelement, for similar to weight described in the weight calculation of problem word using the weight of knowledge word Degree；

The Weight Acquisition subelement of the knowledge word includes：

Common Weight Acquisition is from unit, the weight for obtaining each knowledge word, and the weight of the knowledge word is the knowledge word Weight in this knowledge；

Normalized is normalized from unit for the weight to each knowledge word；

Wherein, total similarity is calculated according to following formula I：

D_Always=a*D_Weight+b*D_VectorFormula I

Wherein, D_AlwaysIndicate total similarity,

D_WeightIndicate weight similarity,

D_VectorIndicate vector similarity,

A indicates the first predetermined coefficient,

B indicates the second predetermined coefficient,

Also, 0<a<1, a+b=1.

5. device according to claim 4, which is characterized in that

The knowledge word is prepared by the following：

Word segmentation processing is carried out to the alternative knowledge；

Described problem word is prepared by the following：

Word segmentation processing is carried out to described problem；

6. device according to claim 4 or 5, which is characterized in that the vector similarity acquiring unit includes：

Vector similarity computation subunit, for using the alternative knowledge vector sum described problem vector calculate it is described to Measure similarity.