Invention content
This application provides a kind of intelligent Answer System knowledge-matched method of problem and devices, are asked with solving intelligence
It answers system problem and matches inaccuracy with knowledge, the problem for causing the answer accuracy rate of extraction low.
The purpose of the present invention is to provide the following aspects:
In a first aspect, this application provides a kind of intelligent Answer System knowledge-matched method of problem, this method includes:
Obtain the problem of client is sent;
Obtain the weight similarity of the alternative knowledge of each and described problem respectively using knowledge word and problem word;
Obtain the vector similarity of alternative knowledge and described problem described in each respectively using knowledge word and problem word;
Using the weight similarity and the vector similarity, the alternative knowledge of each and described problem are calculated separately
Total similarity;
The alternative knowledge that total similarity meets preset rules is obtained, as the knowledge to match with described problem.
Optionally, further include before the weight similarity for obtaining the alternative knowledge of each and described problem respectively:
Knowledge base is generated, includes at least one alternative knowledge in the knowledge base;
Knowledge pre-processes, and carries out word segmentation processing to the alternative knowledge, removes the stop words in word segmentation processing result, to
Obtain the knowledge word in the alternative knowledge.
Optionally, the knowledge word is prepared by the following:
Word segmentation processing is carried out to the alternative knowledge;
Remove the stop words in word segmentation processing result, to obtain the knowledge word in the alternative knowledge;
Optionally, described problem word is prepared by the following:
Word segmentation processing is carried out to described problem;
Remove the stop words in word segmentation processing result, to obtain the problems in described problem word.
Optionally, the power for obtaining alternative knowledge and described problem described in each respectively using knowledge word and problem word
Similarity includes again:
Obtain the weight of knowledge word in alternative knowledge;
Rule, which is assigned, according to preset weight assigns weight to problem word in problem;
Utilize weight similarity described in the weight of knowledge word and the weight calculation of problem word.
Optionally, the weight for obtaining knowledge word in alternative knowledge, including:
The weight of each knowledge word is obtained, the weight of the knowledge word is weight of the knowledge word in this knowledge;
The weight of each knowledge word is normalized.
Optionally, the weight assigns rule and assigns the condition for presetting weight to judge whether described problem word meets, if
Meet, then gives described problem word to assign and preset weight;
If not meeting, the weight of problem word is that all knowledge words identical with problem word are standby in each in described problem
Select the average value of weight in knowledge;
Described assign presets the condition of weight not include described problem word in the knowledge word.
Optionally, the vectorial phase that alternative knowledge and described problem described in each are obtained using knowledge word and problem word
Like degree, including:
The vector of the alternative knowledge is obtained,
The vector of described problem is obtained,
The vector similarity is calculated using the vector of the vector sum described problem of the alternative knowledge.
Optionally, the vector for obtaining the alternative knowledge includes:
The term vector of knowledge word is obtained, the term vector of the knowledge word is word of the knowledge word in the alternative knowledge
Vector;
The vector of the alternative knowledge is calculated using the term vector of the knowledge word.
Optionally, the vector for obtaining described problem includes:
The term vector of acquisition problem word, the term vector of described problem word are identical as the identical term vector of knowledge word;
The vector of described problem is calculated using the term vector of knowledge word.
Optionally, the weight similarity is using a kind of in Jaccard (Jacobi's distance), Hamming distance and editing distance
Or the mode of a variety of combinations obtains;
The vector similarity is obtained using cosine manner;
Total similarity of the alternative knowledge and described problem is the weight of same alternative knowledge and described problem
The sum of the linear weighted function of similarity and the vector similarity.
Optionally, the preset rules are by total sequencing of similarity of all alternative knowledge and described problem, total phase
It is selected like spending in the maximum.
The application has merged weight similarity and vector similarity in knowledge-problem matching process of intelligent Answer System
Two kinds of similarity evaluation systems compensate for systematic error existing for single similarity evaluation method, moreover, the scheme of the application exists
Before calculating weight similarity and vector similarity, word segmentation result is pre-processed, removes the stop words in word segmentation result,
Reduce and accidentally touch rate, in addition, the weight to the knowledge word obtained after pretreatment has carried out normalized, make its threshold value [0,
1], reduce the weight similarity calculation deviation caused by different knowledge word weight difference are big so that problem with it is standby
It selects the weight similarity of knowledge more accurate, and then improves the accuracy of total similarity, further improve intelligent answer system
The system matched accuracy of knowledge-problem.
Second aspect, present invention also provides a kind of knowledge of intelligent Answer System-problem coalignment, described device packets
It includes:
Problem acquiring unit, for obtaining the problem of client is sent;
Weight similarity acquiring unit, for utilize knowledge word and problem word obtain respectively the alternative knowledge of each with it is described
The weight similarity of problem;
Vector similarity acquiring unit, for utilize knowledge word and problem word obtain respectively alternative knowledge described in each with
The vector similarity of described problem;
Total similarity calculated calculates separately each for utilizing the weight similarity and the vector similarity
Total similarity of the alternative knowledge of item and described problem;
Knowledge-problem matching unit meets the alternative knowledge of preset rules for obtaining total similarity, is asked as with described
Inscribe the knowledge to match.
Optionally, the knowledge word is prepared by the following:
Word segmentation processing is carried out to the alternative knowledge;
Remove the stop words in word segmentation processing result, to obtain the knowledge word in the alternative knowledge;
Optionally, described problem word is prepared by the following:
Word segmentation processing is carried out to described problem;
Remove the stop words in word segmentation processing result, to obtain the problems in described problem word.
Optionally, the weight similarity acquiring unit includes:
Knowledge word Weight Acquisition subelement, the weight for obtaining knowledge word in alternative knowledge;
Problem word weight assigns subelement, and power is assigned to problem word in problem for assigning rule according to preset weight
Weight;
Weight similarity calculation subelement, for utilizing weight phase described in weight calculation of the weight of knowledge word with problem word
Like degree.
Optionally, the knowledge word Weight Acquisition subelement includes:
Common Weight Acquisition is from unit, the weight for obtaining each knowledge word, and the weight of the knowledge word is known to be described
Know weight of the word in this knowledge;
Normalized is normalized from unit for the weight to each knowledge word.
Optionally, described problem word power assigns in subelement, and the weight assigns rule whether to judge described problem word
Meet the condition for assigning and presetting weight, if meeting, is assigned to described problem word and preset weight;If not meeting, described problem
The weight of middle problem word is the average value of all knowledge word identical with problem word weights in the alternative knowledge of each;
Described assign presets the condition of weight not include described problem word in the knowledge word.
Optionally, the vector similarity acquiring unit includes:
Knowledge vector obtains subelement, the vector for obtaining the alternative knowledge;
Problem vector obtains subelement, the vector for obtaining described problem;
Vector similarity computation subunit, the vector for the vector sum described problem using the alternative knowledge calculate institute
State vector similarity.
Optionally, the knowledge vector acquisition subelement includes:
The term vector of knowledge word is obtained from unit, and the term vector of the term vector for obtaining knowledge word, the knowledge word is
Term vector of the knowledge word in the alternative knowledge;
Knowledge vector is calculated from unit, the vector for calculating the alternative knowledge using the term vector of the knowledge word.
Optionally, described problem vector acquisition subelement includes:
The term vector of problem word is obtained from unit, the term vector for obtaining problem word, the term vector of described problem word with
The term vector of the identical knowledge word is identical;
Problem vector is calculated from unit, the vector for calculating described problem using the term vector of knowledge word.
Optionally, the weight similarity is using a kind of in Jaccard (Jacobi's distance), Hamming distance and editing distance
Or the mode of a variety of combinations obtains;
The vector similarity is obtained by the way of cosine;
Total similarity of the alternative knowledge and described problem is the weight of same alternative knowledge and described problem
The sum of the linear weighted function of similarity and the vector similarity;
The preset rules are by total sequencing of similarity of all alternative knowledge and described problem, and total similarity is maximum
It is selected in person.
Specific implementation mode
Present invention will now be described in detail, and the features and advantages of the invention will become more with these explanations
It is clear, clear.
The present invention described below.
According to the application's in a first aspect, providing a kind of knowledge of intelligent Answer System-problem matching process, such as Fig. 1
It is shown, wherein this method includes:
S101 obtains the problem of client is sent;
S102 obtains the weight similarity of the alternative knowledge of each and described problem using knowledge word and problem word respectively;
In this application, knowledge word is the word segmentation result of alternative knowledge;Problem word is the participle for the problem of client is sent
As a result.
It is similar to the vector of described problem that S103 utilizes knowledge word and problem word to obtain alternative knowledge described in each respectively
Degree;
S104 utilizes the weight similarity and the vector similarity, calculates separately the alternative knowledge of each and is asked with described
Total similarity of topic;
S105 obtains the alternative knowledge that total similarity meets preset rules, as the knowledge to match with described problem.
In this application, the knowledge word is prepared by the following:
Word segmentation processing is carried out to the alternative knowledge;
Remove the stop words in word segmentation processing result, to obtain the knowledge word in the alternative knowledge.
In this application, described problem word is prepared by the following:
Word segmentation processing is carried out to described problem;
Remove the stop words in word segmentation processing result, to obtain the problems in described problem word.
In this application, the alternative knowledge includes that a standard is asked and asked with optional extension, wherein the extension ask for
The different expression-forms that standard is asked, express same semanteme, such as by banking on how to being said for handling credit card
Bright, the related alternative knowledge with " how handling credit card " is stored in the knowledge base includes:" credit card handles flow ",
" where I can handle credit card ", " handling credit card step " etc., using one of above problem problem as standard
It asks, other three problems are asked as extension.In the present embodiment, for example a problem " credit card of foremost will can be come
Handle flow " it is asked as standard, it asks three other problems as corresponding extension, can also specify in other embodiments
Other problems are asked for standard.
It should be noted that the standard asks the form asked with the extension and semantic formula both may be used, also may be used
In the form of using specific question sentence, all in protection scope of the present invention.
In this application, it is based on every alternative knowledge, the knowledge word is the result after duplicate removal, that is, in same
In the word segmentation processing result of alternative knowledge, identical entry only according to an entry meter, such as:Alternative knowledge is that " I wants to handle
Credit card, how I handle China Merchants Bank's credit card ", the result after word segmentation processing be respectively " I ", " handling ", " credit card ",
" I ", " how ", " handling ", " China Merchants Bank ", " credit card ", that is, although including two identical character strings in the alternative knowledge of this
The result of the word segmentation processing of the alternative knowledge of " I ", " handling ", " credit card ", but this is " I ", " handling ", " credit card ", " such as
What " and " China Merchants Bank ".
It is to pre-establish a deactivated word list to remove stop words processing, every in word segmentation processing result when removing stop words
One entry is matched with the word in deactivated word list, if the entry is present in deactivated word list, by the entry from
It is deleted in Chinese word segmentation treated entry string.
Stop words described herein is the word of no practical significance, such as auxiliary words of mood or structural auxiliary word, such as " ",
" ", " ", " ", " " etc..
The inventors discovered that after removing stop words, the noise in described problem can be removed, to make the alternative knowledge
It is more accurate with the similarity of described problem, to improve the matched accuracy rate of knowledge-problem, and then improve intelligent Answer System
The accuracy rate of answer.
In this application, step S111 and step S112 can also be after step slol before step S101.
In the application S101, it is the text for obtaining client and sending problem to obtain the problem of client is sent.
In this application, can be arbitrary as shown in Figure 1, being not specially limited to the sequence of step S102 and step S103
Sequentially, i.e., the sequence of described two steps can be S102-S103, or S103-S102.
In this application, the weight similarity refer to knowledge based word with the knowledge of the weight calculation of problem word with
The similarity of described problem.
In the application S102, as shown in Fig. 2, described obtain the alternative knowledge of each respectively using knowledge word and problem word
Weight similarity with described problem includes:
S201 obtains the weight of knowledge word in alternative knowledge;
S202 assigns rule according to preset weight and assigns weight to problem word in problem;
S203 utilizes the weight of knowledge word and weight similarity described in the weight calculation of problem word.
In the application S201, the weight for obtaining knowledge word in alternative knowledge includes:
S2011 obtains the weight of each knowledge word, and the weight of the knowledge word is the knowledge word in this knowledge
Weight;
The weight of each knowledge word is normalized in S2012.
In the application S2011, the weight of each knowledge word passes through tf-idf (term frequency-inverse
Document frequency, term frequency-inverse document rate) method acquisition.
In the application S2012, normalized is passed through to the weight of each knowledge word, it is [0,1] to make its threshold value.
The inventors discovered that it is big by the weighted value range that tf-idf methods obtain, such as [Isosorbide-5-Nitrae 000] can be reached, such as
The weight of fruit each knowledge word in terms of the weight, then the knowledge word that weight can be caused small are about 0 when calculating similarity, to
Cause the weight similarity being calculated and true similarity difference huge, that is, the serious misalignment of weight similarity, and then lead to intelligence
The answer accuracy rate that energy question answering system provides is low.
The application passes through normalized to the weight of each knowledge word, and it is [0,1] to make its threshold value, has both maintained each
The characteristic distributions of a knowledge word weight also reduce the gap of each knowledge word weight so that alternative knowledge and described problem
Similarity calculation is more reasonable accurate.
In the application S202, the weight assigns rule and presets weight to judge whether described problem word meets to assign
Condition assigns to described problem word if meeting and presets weight;If not meeting, the weight of problem word is institute in described problem
There is the average value of knowledge word identical with problem word weight in the alternative knowledge of each.
In this application, described assign presets the condition of weight not include described problem word in the knowledge word.
In this application, all default weights for meeting the problem of assigning the condition for presetting weight word are all identical, such as obtain
The problem of taking is " how I handle credit card ", word segmentation processing and remove the result after credit word be " I ", " how ", " do
Reason ", " credit card ", in knowledge word only " how ", " handling ", " credit card ", that is, do not include problem word " I " in knowledge word,
Therefore, assign problem word " I " default weight (such as 0.2).
In the application S2023, word segmentation processing is carried out to different alternative knowledge, identical knowledge word is likely to be obtained, not
With in alternative knowledge, the weight possibility of above-mentioned identical knowledge word is identical may also be different, when they are different, choose any one
A weight assign problem word be all it is unilateral, it is inaccurate, and take average weight of the above-mentioned knowledge word in all alternative knowledge
It is then of universal significance, enables to knowledge weight and problem weight more accurate.
In the application one embodiment, alternative knowledge is " credit card handles flow ", " credit card logout flow path ", " public
Hand over card to handle flow " and " mass transit card logout flow path ", then knowledge word average value of weight in the alternative knowledge of each obtain as follows
It takes:
Wherein, each knowledge word average value of weight in the alternative knowledge of each is respectively:
In this application, further include before the weight of problem word in S202 acquisition described problems:
S221 carries out word segmentation processing to described problem;
S222 removes the stop words in word segmentation processing result, to obtain the problems in described problem word.
In this application, word segmentation processing mode is carried out to described problem and carries out the side of word segmentation processing to the alternative knowledge
Formula is identical.
In this application, remove the mode of stop words in described problem word segmentation result and remove the alternative knowledge point with described
The mode of stop words is identical in word result.
In the application S203, weight similarity is logical described in weight calculation of the weight using knowledge word with problem word
One or more kinds of modes combined in Jaccard (Jacobi's distance), Hamming distance and editing distance are crossed to obtain.
In the application one embodiment, it is that " credit card handles flow " is with described problem to calculate the alternative knowledge
The method of the weight similarity of " how I handle credit card " is:
Set A is set as described problem word and its set of weight, set B is the set of knowledge word and its weight, then collects
Close A be " I ", " how ", " handling " and " credit card ", weight is respectively 1/5,1/3,1/3 and 1/3, set B be " credit
Card ", " handling " and " flow ", weight is respectively 1/3,1/3 and 1/3, the weight similarity of described problem and the alternative knowledge
For:
Jaccard (A, B)=| A intersect B |/| A union B |
Wherein, Jaccard (A, B) indicates the weight similarity of set A and B;
| A intersect B | indicate that A, B two gathers the sum of the weight of intersection;
| A union B | indicate that A, B two gathers the sum of the weight of union;
For the present embodiment, two intersection of sets collection of A, B is " handling " and " credit card ", and weight is respectively 1/3 and 1/3, A,
Two union of sets collection of B be " I ", " how ", " handling ", " credit card " and " flow ", weight is respectively 1/5,1/3,1/3,1/3
With 1/3;
Then Jaccard (A, B)=(1/3+1/3)/(1/5+1/3+1/3+1/3+1/3)=(2/3)/(5/7)=10/21,
That is, the weight similarity of the alternative knowledge and described problem is 10/21.
In the application S103, as shown in figure 3, described obtained described in each alternatively respectively using knowledge word and problem word
The vector similarity of knowledge and described problem includes:
S301 obtains the vector of the alternative knowledge,
S302 obtains the vector of described problem,
S303 calculates the vector similarity using the vector of the vector sum described problem of the alternative knowledge.
In this application, the vector similarity knowledge that refer to knowledge based word calculate with the vector of problem word with
The similarity of described problem.
In the application S301, the vector for obtaining the alternative knowledge, including:
S3011 obtains the term vector of knowledge word, and the term vector of the knowledge word is the knowledge word in the alternative knowledge
In term vector;
S3012 calculates the vector of the alternative knowledge using the term vector of the knowledge word.
In this application, the dimension of described problem word term vector is identical as the dimension of knowledge word term vector.
The term vector of identical knowledge word is identical in the knowledge base, and the term vector of the knowledge word passes through word2vec
A kind of mode in (word2vector, i.e. word are embedded in) or one-hot (one-hot encoding, i.e. one-hot coding) obtains
It takes.
In this application, the vector of the alternative knowledge is the average term vector of all knowledge words in described problem, that is, institute
There is knowledge word to be averaged obtained vector on every dimension, such as alternative knowledge is " credit card handles flow ", point
The result of word processing is " credit card ", " handling " and " flow ", their vector expression is respectively:
Credit card vector indicates [8/10,1/10,1/10]
It handles vector and indicates [3/10,6/10,1/10]
Flow vector indicates [4/10,2/10,4/10]
Then the vector of the alternative knowledge is [(8/10+3/10+4/10)/3, (1/10+6/10+2/10)/3, (1/10+1/10
+ 4/10)/3]=[1/2,3/10,1/5].
In the application S302, the vector for obtaining described problem, including:
S3021 obtains the term vector of problem word, the term vector of the term vector of described problem word and the identical knowledge word
It is identical;
S3022 calculates the vector of described problem using the term vector of knowledge word.
In this application, the method phase of the vector for obtaining described problem and the vector for obtaining the alternative knowledge
Together.
In the application S303, the vector of the vector sum described problem using the alternative knowledge calculates the vector
Similarity, the vector similarity are obtained by the way of cosine.
In one embodiment of the application, in a cosine manner for illustrate the acquisition of vector similarity:For example, calculating standby
The vector similarity of knowledge " credit card handles flow " and described problem " how I handle credit card " is selected, knowledge word is set and asks
The term vector of epigraph is three-dimensional, and the alternative knowledge is carried out word segmentation processing and removes stop words, obtaining result is:
Credit card vector indicates [8/10,1/10,1/10]
It handles vector and indicates [3/10,6/10,1/10]
Flow vector indicates [4/10,2/10,4/10]
Then the vector of the alternative knowledge be expressed as with A [(8/10+3/10+4/10)/3, (1/10+6/10+2/10)/3,
(1/10+1/10+4/10)/3]=[1/2,3/10,1/5], that is, A=[1/2,3/10,1/5];
Described problem is subjected to word segmentation processing and removes stop words, obtaining result is:
Then the vector of problem be expressed as with B [(1/10+5/10+3/10+8/10)/4, (3/10+4/10+6/10+1/10)/
4, (6/10+1/10+1/10+1/10)/4]=[17/40,14/40,9/40], i.e. B=[17/40,14/40,9/40];
Then the vector similarity of the alternative knowledge and described problem is that the cosine of the angle theta of A and B is equal to:
Specifically, [1/2,3/10,1/5] A=, B=[17/40,14/40,9/40], then:
The similarity of A and B
Cos θ=
[(17/40*1/2)+(14/40*3/10)+(9/40*1/5)]/sqrt[(17/40*17/40+14/40*14/40+
9/40*9/40)*(1/ 2*1/2+3/10*3/10+1/5*1/5)]
=98/100,
That is, the vector similarity of the alternative knowledge and described problem is 98/100.
It is described to utilize the weight similarity and the vector similarity in the application S104, calculate separately each
Total similarity of alternative knowledge and described problem, to seek the weight similarity of current alternative knowledge and described problem and described
The sum of linear weighted function of vector similarity, that is, assign weight the first predetermined coefficient of similarity respectively, vector similarity second is default
Coefficient calculates separately the product of the product and vector similarity and the second predetermined coefficient of weight similarity and the first predetermined coefficient,
Total similarity is two sum of products.
In the application S104, total similarity is calculated according to following formula I:
DAlways=a*DWeight+b*DVectorFormula I
Wherein, DAlwaysIndicate total similarity,
DWeightIndicate weight similarity,
DVectorIndicate vector similarity,
A indicates the first predetermined coefficient,
B indicates the second predetermined coefficient,
Also, 0<a<1, a+b=1.
In the application S105, the preset rules are to arrange total similarity of all alternative knowledge and described problem
Sequence is selected in total similarity the maximum.
To be more fully understood by the method that intelligent Answer System of the present invention extracts knowledge, it is set forth below one specifically
Embodiment illustrates.
Described problem is " how I handle credit card ", and alternative knowledge is " credit card handles flow ", " credit card nullifies stream
Journey ", " mass transit card handles flow " and " mass transit card logout flow path ", then the alternative knowledge be with the matched process of described problem:
(1) word segmentation processing is carried out to the alternative knowledge, removes stop words therein, obtain knowledge word, recycle tf-
Idf calculates weight of each knowledge word in this alternative knowledge, as a result as follows:
(2) according to (1) as a result, calculating each knowledge word average value of weight in the alternative knowledge of each and being respectively:
(3) word segmentation processing is carried out to described problem, removes stop words therein, problem word is obtained, according to preset weight
It assigns rule and assigns weight to described problem word, in the present embodiment, preset weight is 1/5, and weight knot is assigned to problem word
Fruit is as follows:
Then, the weight similarity of alternative knowledge " credit card handles flow " and described problem " how I handle credit card "
For:Jaccard (A, B)=| A intersect B |/| A union B |=(1/3+1/3)/(1/5+1/3+1/3+1/3+1/3)
=(2/3)/(5/7)=10/21, that is, the weight similarity of the alternative knowledge and described problem is 10/21;Remaining is alternatively known
Know and calculated successively in the same manner with the weight similarity of described problem, it is as shown in table 1 below to obtain result;
(4) in the present embodiment, the term vector for setting each knowledge word and problem word is three-dimensional, uses word2vec
The term vector for obtaining each knowledge word and problem word calculates described alternative by taking alternative knowledge " credit card handles flow " as an example
The vector similarity of knowledge and described problem, it is as follows:
The knowledge term vector of the alternative knowledge is followed successively by:
Credit card vector indicates [8/10,1/10,1/10]
It handles vector and indicates [3/10,6/10,1/10]
Flow vector indicates [4/10,2/10,4/10],
Then the vector of the alternative knowledge be expressed as with A [(8/10+3/10+4/10)/3, (1/10+6/10+2/10)/3,
(1/10+1/10+4/10)/3]=[1/2,3/10,1/5], that is, A=[1/2,3/10,1/5];
Described problem is subjected to word segmentation processing and removes stop words, obtaining result is:
Then the vector of problem be expressed as with B [(1/10+5/10+3/10+8/10)/4, (3/10+4/10+6/10+1/10)/
4, (6/10+1/10+1/10+1/10)/4]=[17/40,14/40,9/40], i.e. B=[17/40,14/40,9/40];
Then the vector similarity of the alternative knowledge and described problem is that the cosine of the angle theta of A and B is equal to:
Specifically, [1/2,3/10,1/5] A=, B=[17/40,14/40,9/40], then:
The similarity of A and B
Cos θ=[(17/40*1/2)+(14/40*3/10)+(9/40*1/5)]/sqrt [(17/40*17/40+14/40*
14/40+9/40*9/4 0)*(1/2*1/2+3/10*3/10+1/5*1/5)]
=98/100,
That is, the vector similarity of the alternative knowledge and described problem is 98/100, remaining alternative knowledge and described problem
Vector similarity calculate successively in the same manner, it is as shown in table 1 below to obtain result;
(5) calculate separately same alternative knowledge and described problem weight similarity and vector similarity linear weighted function it
With, wherein the first predetermined coefficient is a=0.4, and the second predetermined coefficient is b=0.6, and the results are shown in Table 1.
The similarity result of table 1 alternative knowledge and problem
(6) " total similarity " in comparison sheet 1, the maximum alternative knowledge of numerical value is " credit card handles flow ", that is, matching knot
Fruit is " credit card handles flow ".
According to the second aspect of the application, a kind of knowledge of intelligent answer-problem coalignment is additionally provided, such as Fig. 4 institutes
Show, described device includes:
Problem acquiring unit 401, for obtaining the problem of client is sent;
Weight similarity acquiring unit 402, for utilize knowledge word and problem word obtain respectively the alternative knowledge of each with
The weight similarity of described problem;
Vector similarity acquiring unit 403 is alternatively known for being obtained respectively using knowledge word and problem word described in each
Know the vector similarity with described problem;
Total similarity calculated 404 calculates separately every for utilizing the weight similarity and the vector similarity
Total similarity of one alternative knowledge and described problem;
Knowledge-problem matching unit 405 meets the alternative knowledge of preset rules for obtaining total similarity, as with institute
The knowledge that the problem of stating matches.
In this application, the knowledge word is prepared by the following:
Word segmentation processing is carried out to the alternative knowledge;
Remove the stop words in word segmentation processing result, to obtain the knowledge word in the alternative knowledge;
In this application, described problem word is prepared by the following:
Word segmentation processing is carried out to described problem;
Remove the stop words in word segmentation processing result, to obtain the problems in described problem word.
In this application, as shown in figure 5, the weight similarity acquiring unit 402 includes:
Knowledge word Weight Acquisition subelement 4021, the weight for obtaining knowledge word in alternative knowledge;
Problem word weight assigns subelement 4022, is assigned to problem word in problem for assigning rule according to preset weight
Weight;
Weight similarity calculation subelement 4023, for being weighed described in weight calculation of the weight with problem word using knowledge word
Weight similarity.
In this application, the knowledge word Weight Acquisition subelement 4021 includes:
For common Weight Acquisition from unit 40211, the weight of the weight for obtaining each knowledge word, the knowledge word is institute
State weight of the knowledge word in this knowledge;
Normalized is normalized from unit 40212 for the weight to each knowledge word.
In this application, described problem word, which is weighed, assigns in subelement 4022, and the weight assigns rule to be asked described in judgement
Whether epigraph meets the condition for assigning and presetting weight, if meeting, is assigned to described problem word and presets weight;If not meeting,
The weight of problem word is that all knowledge words identical with problem word weight in the alternative knowledge of each is averaged in described problem
Value;
Described assign presets the condition of weight not include described problem word in the knowledge word.
In this application, as shown in fig. 6, the vector similarity acquiring unit 403 includes:
Knowledge vector obtains subelement 4031, the vector for obtaining the alternative knowledge;
Problem vector obtains subelement 4032, the vector for obtaining described problem;
Vector similarity computation subunit 4033, for the vector sum described problem using the alternative knowledge to gauge
Calculate the vector similarity.
In this application, the knowledge vector acquisition subelement 4031 includes:
The term vector of knowledge word is obtained from unit 40311, the term vector for obtaining knowledge word, the word of the knowledge word to
Amount is term vector of the knowledge word in the alternative knowledge;
Knowledge vector is calculated from unit 40312, for calculating the alternative knowledge using the term vector of the knowledge word
Vector.
In this application, described problem vector acquisition subelement 4032 includes:
The term vector of problem word is obtained from unit 40321, the term vector for obtaining problem word, the word of described problem word to
Amount is identical as the identical term vector of knowledge word;
Problem vector is calculated from unit 40322, the vector for calculating described problem using the term vector of knowledge word.
In this application, the weight similarity is using in Jaccard (Jacobi's distance), Hamming distance and editing distance
The mode that one or more combine obtains;
The vector similarity is obtained by the way of cosine;
Total similarity of the alternative knowledge and described problem is the weight of same alternative knowledge and described problem
The sum of similarity and the vector similarity;
The preset rules are by total sequencing of similarity of all alternative knowledge and described problem, and total similarity is maximum
It is selected in person.
Fig. 7 shows the block diagram for the computer system 800 that can implement embodiment on it.Computer system 800 is wrapped
Include processor 810, storage medium 820, system storage 830, monitor 840, keyboard 850, mouse 860, network interface 870
With video adapter 880.These components are coupled by system bus 890.
Storage medium 820 (such as hard disk) stores multiple programs, including operating system, application program and other program moulds
Block.It is, for example, keyboard that user can input order and information, input equipment by input equipment into computer system 800
850, touch tablet (not shown) and mouse 860.Come display text and graphical information using monitor 840.
Operating system is on processor 810 and for coordinating and providing in the personal computer system 800 in Fig. 7
Various parts control.Furthermore, it is possible to using computer program to implement above-mentioned various implementations in computer system 800
Example.
It would be recognized that hardware component shown in fig. 7 is only for illustrative purposes, and physical unit may be according to being real
It applies the present invention and the computing device of deployment and changes.
In addition, computer system 800 for example can be desktop computer, server computer, laptop computer or nothing
Line equipment, such as mobile phone, personal digital assistant (PDA), handheld computer etc..
It would be recognized that the embodiment in the scope of the invention can be embodied as to the form of computer program product, computer
Program product includes computer executable instructions, such as program code, can be run in conjunction with any of appropriate operating system
In appropriate computing environment, operating system is, for example, Microsoft Windows, Linux or UNIX operating system.The scope of the invention
Interior embodiment can also include program product, and program product includes that computer-readable medium can for carrying or storing computer
Execute instruction or data structure thereon.Such computer-readable medium can be it is any can be by general or specialized calculating
The usable medium that machine accesses.For example, such computer-readable medium may include RAM, ROM, EPROM, EEPROM, CD-
ROM, magnetic disk storage or other storage devices, or can be used in carrying or storing desired with form of computer-executable instructions
Program code and any other medium that can be accessed by general or specialized computer.
The method, apparatus and system of answer are extracted according to intelligent Answer System provided by the invention, are had beneficial below
Effect:
(1) normalized has been carried out to weight, it is more accurate as evaluation factor thereby using weight;
(2) two kinds of factors of comprehensive weight and vector judge the similarity of problem and knowledge so that similarity judges subject to more
Really;
(3) work for reducing artificial correction problem saves a large amount of cost of labor of enterprise.
It is described the invention in detail above in association with detailed description and exemplary example, but these explanations are simultaneously
It is not considered as limiting the invention.It will be appreciated by those skilled in the art that without departing from the spirit and scope of the invention,
Can be with various equivalent substitutions, modifications or improvements are made to the technical scheme of the invention and its embodiments, these each fall within the present invention
In the range of.Scope of protection of the present invention is subject to the appended claims.