[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN104516986A - Statement identification method and device - Google Patents

Statement identification method and device Download PDF

Info

Publication number
CN104516986A
CN104516986A CN201510024299.9A CN201510024299A CN104516986A CN 104516986 A CN104516986 A CN 104516986A CN 201510024299 A CN201510024299 A CN 201510024299A CN 104516986 A CN104516986 A CN 104516986A
Authority
CN
China
Prior art keywords
statement
identified
similarity
candidate
intention
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510024299.9A
Other languages
Chinese (zh)
Other versions
CN104516986B (en
Inventor
王金龙
贾明静
董日壮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qingdao University of Technology
Original Assignee
Qingdao University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qingdao University of Technology filed Critical Qingdao University of Technology
Priority to CN201510024299.9A priority Critical patent/CN104516986B/en
Publication of CN104516986A publication Critical patent/CN104516986A/en
Application granted granted Critical
Publication of CN104516986B publication Critical patent/CN104516986B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2452Query translation
    • G06F16/24522Translation of natural language queries to structured queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2452Query translation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a sentence recognition method and a sentence recognition device, wherein the method comprises the following steps: determining non-stop words as keywords for the obtained sentences to be identified, selecting candidate sentences containing the keywords of the sentences to be identified in a preset sentence library, determining a theme classification label and an intention classification label of the sentences to be identified by utilizing a pre-constructed classification model, wherein the classification model can identify the intentions of unknown classes, grouping a plurality of candidate sentences according to the preset intention labels when the identified intention classification labels are the unknown classes and the candidate sentences are a plurality of, and displaying the preset information corresponding to the candidate sentences in each group. As different groups correspond to different intention types, candidate sentences are selected from each group as target sentences, and preset information corresponding to each target sentence is displayed, so that the problem that the fed-back information is single or even can not be fed back is solved.

Description

A kind of statement recognition methods and device
Technical field
The application relates to language data processing technology field, especially a kind of statement recognition methods and device.
Background technology
In natural language processing field, usually need the identification that natural language is intended to, automatically to generate feedback information.Such as, when automatic question answering, user's read statement " why refrigerator no power ", needs to carry out intention assessment to read statement, to feed back the cold reason of refrigerator.
Due to the complicacy of natural language, a usually corresponding multiple different sons intention of statement, such as, the statement of user's input is " refrigerator no power ", in this kind of situation, user thinks the cold reason of inquiry refrigerator, also may be think the cold solution of inquiry refrigerator.
At present, this kind is existed to the statement of multiple different son intention, the feedback information of generation is comparatively single, as only reason feedback or only feed back solution, even cannot generate feedback information.
Summary of the invention
In view of this, this application provides a kind of statement recognition methods and device, in order to solve existing recognition methods export single even cannot the technical matters of output feedack information.For realizing described goal of the invention, technical scheme provided by the invention is as follows:
A kind of statement recognition methods, comprising:
Obtain statement to be identified;
Determine that the non-stop words in described statement to be identified is keyword;
In default statement library, choose the candidate's statement comprising described keyword;
Utilize the disaggregated model built in advance, determine subject classification label and the intent classifier label of described statement to be identified;
When described intent classifier label is unknown class and described candidate's statement is multiple, multiple described candidate's statement is classified according to respective default intention labels, obtains multiple grouping;
Candidate's statement in each described grouping is defined as object statement; Wherein, the preset themes label of described object statement is identical with the subject classification label of described statement to be identified;
Show the presupposed information that each described object statement is corresponding.
Alternatively, also comprise:
When the non-unknown class of described intent classifier label, determine the similarity of described statement to be identified and each described candidate's statement;
Candidate's statement corresponding for the maximum similarity exceeding default similarity threshold is defined as object statement;
Show the presupposed information that described object statement is corresponding.
Alternatively, describedly candidate's statement in each described grouping be defined as object statement comprise:
Determine the similarity of described statement to be identified and each described candidate's statement;
Carry out descending sort according to the size of similarity, in each described grouping, choose sequence front and the candidate's statement exceeding the predetermined number of default similarity threshold is object statement.
Alternatively, the disaggregated model that described utilization builds in advance, determine that the subject classification label of described statement to be identified and intent classifier label comprise:
According to default Feature Words extracting rule, in described statement to be identified, extract multiple characteristic of division;
Described multiple characteristic of division is inputed to described disaggregated model, obtains multiple intention probable value and multiple theme probable value;
Tag along sort corresponding for maximum intention probable value is defined as the intent classifier label of described statement to be identified, and tag along sort corresponding for maximum theme probable value is defined as the subject classification label of described statement to be identified.
Alternatively, the building process of described disaggregated model comprises:
Obtain and comprise multiple training set having marked statement; Wherein, each described mark statement has respective intention labels and theme label;
Utilize and preset training method, described training set is trained, obtain disaggregated model; Wherein, described disaggregated model is used for classifying to the intention of statement to be identified and theme.
Alternatively, describedly determine that the similarity of described statement to be identified and each described candidate's statement comprises:
The semantic similarity, the theme that calculate described statement to be identified and each described candidate's statement are respectively intended to similarity and syntax similarity; Wherein, described semantic similarity is the semantic similarity between the keyword of statement to be identified and the keyword of candidate's statement; Described theme intention similarity is the theme of statement to be identified and intention and the theme of candidate's statement and the similarity of intention; Described syntax similarity is the similarity of the syntactic structure of statement to be identified and the syntactic structure of candidate's statement;
Each for each described candidate's statement self-corresponding described semantic similarity, intention similarity and described syntax similarity are weighted and average, obtain described statement to be identified and each described candidate's statement similarity separately.
Alternatively, the semantic similarity calculating described statement to be identified and described candidate's statement comprises:
The each keyword calculating described statement to be identified successively respectively with the Words similarity of each keyword of described candidate's statement, obtain similarity matrix;
Add up the total value of maximum Words similarity in each row of described similarity matrix, and calculate the row mean value of this total value;
Add up the total value of maximum Words similarity in each row of described similarity matrix, and calculate the column average value of this total value;
Calculate the mean value of described row mean value and described column average value, obtain the semantic relevancy of described statement to be identified and described candidate's statement.
Alternatively, the theme calculating described statement to be identified and described candidate's statement is intended to similarity and comprises:
Judge that whether the subject classification label of described statement to be identified is identical with the preset themes tag along sort of described candidate's statement, obtains the first judged result;
Whether the intent classifier label judging described statement to be identified is unknown class, obtains the second judged result;
Judge that whether the intent classifier label of described statement to be identified is identical with the default intention labels of described candidate's statement, obtain the 3rd judged result;
When described first judged result for be and described second judged result for being time, determine described theme intention similarity be 1;
When described first judged result be yes, described second judged result be no and described 3rd judged result for being time, determine described theme intention similarity be 1;
When described first judged result be yes, described second judged result be no and described 3rd judged result be no time, determine described theme intention similarity be greater than 0 and be less than 1 preset value;
When described first judged result is no, determine that described theme intention similarity is 0.
Alternatively, the syntax similarity calculating described statement to be identified and described candidate's statement comprises:
Syntactic analysis is carried out to described statement to be identified, obtains the first syntactic constituent of described statement to be identified, and obtain second syntactic constituent preset of described candidate's statement;
Calculate the first Words similarity of the identical component of described first syntactic constituent and described second syntactic constituent;
Calculate the second Words similarity of described first syntactic constituent and the identical ornamental equivalent of described second syntactic constituent;
Obtain the default penalty factor of the non-equal composition of described first syntactic constituent and described second syntactic constituent;
Utilize described first Words similarity, described second Words similarity and described default penalty factor, calculate weighted mean value, obtain syntax similarity.
Alternatively, when the keyword determined is multiple, described in default statement library, choose the candidate's statement comprising described keyword and comprise:
Add up each statement in described default statement library and comprise the number of keyword in statement to be identified;
Number according to the keyword comprised carries out descending sort, and the statement choosing the preceding predetermined number that sorts is candidate's statement.
Alternatively, the described non-stop words determined in described statement to be identified is that keyword comprises:
Participle is carried out to described statement to be identified, obtains multiple participle word;
Remove the stop words in described multiple participle word, obtain keyword.
Present invention also provides a kind of statement recognition device, comprising:
Statement acquisition module to be identified, for obtaining statement to be identified;
Keyword determination module, for determining that the non-stop words in described statement to be identified is keyword;
Candidate's statement acquisition module, in default statement library, chooses the candidate's statement comprising described keyword;
Theme and intention determination module, for utilizing the disaggregated model built in advance, determine subject classification label and the intent classifier label of described statement to be identified;
Candidate's statement grouping module, for when described intent classifier label is unknown class and described candidate's statement is multiple, classifies multiple described candidate's statement according to respective default intention labels, obtains multiple grouping;
Object statement determination module, for being defined as object statement corresponding to described statement to be identified by the candidate's statement in each described grouping; Wherein, the preset themes label of described object statement is identical with the subject classification label of described statement to be identified;
Presupposed information display module, for showing presupposed information corresponding to each described object statement.
Compared with prior art, the present invention has following beneficial effect:
The invention provides a kind of statement recognition methods and device, the method comprises: to the statement to be identified got, determine that non-stop words is keyword, in default statement library, choose candidate's statement of the keyword comprising statement to be identified, utilize the disaggregated model built in advance, determine theme and the intent classifier label of statement to be identified respectively, need explanation, disaggregated model can identify the intention of unknown class, the intent classifier label identified be unknown class and candidate's statement is multiple time, according to the intention labels preset, by multiple candidate's statement grouping, presupposed information corresponding for candidate's statement in each grouping is shown.Due to the intention type that different grouping is corresponding different, from each grouping, select candidate's statement as object statement, and then show each self-corresponding presupposed information of each object statement, thus solve the single problem even cannot fed back of feedack.
Accompanying drawing explanation
In order to be illustrated more clearly in the embodiment of the present invention or technical scheme of the prior art, be briefly described to the accompanying drawing used required in embodiment or description of the prior art below, apparently, accompanying drawing in the following describes is only embodiments of the invention, for those of ordinary skill in the art, under the prerequisite not paying creative work, other accompanying drawing can also be obtained according to the accompanying drawing provided.
The process flow diagram of the statement recognition methods that Fig. 1 provides for the embodiment of the present invention;
The process flow diagram of the statement recognition methods that Fig. 2 provides for another embodiment of the present invention;
The determination process flow diagram of the semantic similarity that Fig. 3 provides for further embodiment of this invention;
The structural representation of the statement recognition device that Fig. 4 provides for the embodiment of the present invention.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, be clearly and completely described the technical scheme in the embodiment of the present invention, obviously, described embodiment is only the present invention's part embodiment, instead of whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art, not making the every other embodiment obtained under creative work prerequisite, belong to the scope of protection of the invention.
With reference to Fig. 1, it illustrates the flow process of the statement recognition methods that the embodiment of the present invention provides, specifically comprise the following steps:
Step S101: obtain statement to be identified.
Such as, the statement to be identified of acquisition is " why refrigerator no power ".Alternatively, statement to be identified is the statement that user inputs, or the statement that other programs provide.
Step S102: determine that the non-stop words in described statement to be identified is keyword.
Wherein, stop words to refer in statement the word of not concrete meaning, and as function word, function word is as " ", " " etc.Comprise multiple word in statement to be identified, the word in multiple word not being stop words is defined as keyword.Such as, for the above-mentioned statement to be identified enumerated, the keyword determined is " why ", " refrigerator ", " no ", " energising ".
Step S103: in default statement library, chooses the candidate's statement comprising described keyword.
In the present embodiment, be previously provided with statement library, in this statement library, comprise multiple statement.Statement is made up of multiple word, selects the statement alternatively statement comprising the keyword of above-mentioned statement to be identified in the word formed in multiple word.It should be noted that, the keyword comprising statement to be identified in candidate's statement does not require that candidate's statement comprises all keywords, as long as it is any one or more to comprise wherein.
Such as, for the above-mentioned statement to be identified enumerated, the candidate's statement chosen comprises " why my refrigerator no power " and " refrigerator how no power ".
Step S104: utilize the disaggregated model built in advance, determine subject classification label and the intent classifier label of described statement to be identified.
In the present embodiment, be built with disaggregated model in advance, disaggregated model utilizes classification based training method to train and obtains, subject categories and the intention kind of predetermined number can be identified respectively, when inputting certain statement to be identified to disaggregated model, disaggregated model can determine that this disaggregated model specifically belongs to theme and the intention of which kind respectively.Such as, disaggregated model can identify the multiple subject classification such as failure classes, pre-sales class, after sale class, and recommend class, true class, evaluate class, method class, demand class, enumerate class, be non-class, comparing class, reason class, the multiple intent classifier such as class, relation object, unknown class is described.Certainly, be only several example listed by above, the subject classification label in the present embodiment and intent classifier label can be different because of the difference of practical application scene.
Visible, subject classification label refers to the scope belonging to statement to be identified, shows that user wants the content of seeking advice to belong to which aspect.Such as, the statement of input is " the refrigerator relative energy-saving of what plate ", shows that user wants to seek advice from the problem of pre-sales aspect; The statement of input be " why my refrigerator no power ", shows that user wants the problem of consulting fault aspect.
Intent classifier label refers to the type belonging to feedback information corresponding to statement to be identified, shows the content type of the feedback information that user goes for.Such as, the statement of input is " the refrigerator relative energy-saving of what plate ", shows that user wants to obtain recommendation information, and the intent classifier label of this statement is for recommending class; The statement of input is " why my refrigerator no power ", and show that user wants to know the cold reason of refrigerator, the intent classifier label of this statement is reason class.
It should be noted that, the intention kind that disaggregated model can be determined comprises unknown class, and unknown class shows the intention of statement to be identified and indefinite, is appreciated that out two kinds and two or more sons intention.Such as, statement to be identified is " my refrigerator no power ", and the intention of this statement is also indefinite, and intention may be how to solve, or what reason.For this kind of statement to be identified, the intent classifier label that disaggregated model is determined is " unknown class ".For statement to be identified " why refrigerator no power ", the intent classifier label that disaggregated model is determined is " reason class ".
In addition, the concrete building process about disaggregated model refers to hereafter.
Step S105: when described intent classifier label is unknown class and described candidate's statement is multiple, classifies multiple described candidate's statement according to respective default intention labels, obtains multiple grouping.
It should be noted that, the word preset in statement library has default intention labels, intention labels is in order to show the content type of the feedback information gone for, such as, the default intention labels of " my refrigerator how no power " is " reason class ", the default intention labels of " cost performance of the refrigerator of which plate is higher " is " recommendation class ", the default intention labels of " energy-efficient performance of Haier's refrigerator how " is " evaluation class ".Like this, each candidate's word chosen from default statement library has default intention labels too.
Wherein, the intent classifier label of statement to be identified is judged, if unknown class label, the intention of statement to be identified is described and indefinite, selects possible candidate's statement according to possible several intentions.Particularly, when the candidate's statement got in step S103 is multiple, the plurality of statement may comprise multiple intention type, therefore, classify according to candidate's statement default intention labels separately, the candidate's statement being about to preset intention labels identical is divided into one group, thus obtains multiple grouping.
Step S106: the candidate's statement in each described grouping is defined as object statement; Wherein, the preset themes label of described object statement is identical with the subject classification label of described statement to be identified.
It should be noted that, word in default statement library is except having default intention labels, also there is default subject classification label, such as, the preset themes label of " my refrigerator how no power " is " failure classes ", the preset themes label of " cost performance of the refrigerator of which plate is higher " is " pre-sales class ", the preset themes label of " energy-efficient performance of Haier's refrigerator how " is " pre-sales class ".Like this, each candidate's word chosen from default statement library has preset themes label too.
In the grouping of candidate's statement composition, determine object statement, the preset themes label of object statement is identical with the theme label of the statement to be identified determined in step S104.Due to, when the intent classifier label of statement to be identified is confirmed as unknown class, show the intention of statement to be identified and indefinite, multiple different son intention can be indicated, therefore, need all to determine object statement in each grouping of candidate's statement composition.That is, candidate's statement comprises the default intention labels of how many kinds of, be then divided into how many grouping, and then can determine the object statement of intention correspondence of how many kinds.
Particularly, one or more candidate's statement may be comprised in grouping, in each grouping, select candidate's statement.The mode selected can be select all candidate's statements in grouping, or selects part candidate statement.Particularly, selection portion timesharing can be the candidate's statement selecting equal number in each grouping, certainly, also can be the candidate's statement selecting varying number.Detailed selection mode can vide infra description.
Step S107: show the presupposed information that each described object statement is corresponding.
Wherein, the statement preset in statement library is provided with corresponding presupposed information.The statement inputted can be thought in statement to be identified, then presupposed information is the feedback information of read statement.It should be noted that, object statement is multiple, then the presupposed information shown is multiple.
Presupposed information is relevant to the application scenarios of recognition methods, and application scenarios is different, then the presupposed information arranged can be different.Such as, if in question and answer scene, the statement to be identified of user's input is for puing question to sentence, then the presupposed information of object statement is answer sentence.
From above technical scheme, present embodiments provide a kind of statement recognition methods, to the statement to be identified got, determine that non-stop words is keyword, in default statement library, choose candidate's statement of the keyword comprising statement to be identified, utilize the disaggregated model built in advance, determine subject classification label and the intent classifier label of statement to be identified respectively, need explanation, disaggregated model can identify the intention of unknown class, the intent classifier label identified be unknown class and candidate's statement is multiple time, according to the intention labels preset, by multiple candidate's statement grouping, presupposed information corresponding for candidate's statement in each grouping is shown.Due to the intention type that different grouping is corresponding different, from each grouping, select candidate's statement as object statement, and then show each self-corresponding presupposed information of each object statement, thus solve after input statement to be identified, the single problem even cannot fed back of feedack, meets user to the diversified demand of feedback information.
It should be noted that, above-mentioned question and answer scene is only a kind of scene enumerated, and the present invention is not limited thereto, can also be other scenes, such as, in chat scenario, user's read statement, this statement is as statement to be identified, and the presupposed information of display is the feedback statement of chat statement.
As shown in Figure 2, on the basis of above-described embodiment, can also comprise:
Step S205: judge whether described intent classifier label is unknown class; If so, step S206 is performed; Otherwise, perform step S207.
Step S207: the similarity determining described statement to be identified and each described candidate's statement; Candidate's statement corresponding for the maximum similarity exceeding default similarity threshold is defined as object statement; Show the presupposed information that described object statement is corresponding.
Step S206: when described intent classifier label is unknown class and described candidate's statement is multiple, classifies multiple described candidate's statement according to respective default intention labels, obtains multiple grouping; Candidate's statement in each described grouping is defined as object statement; Show the presupposed information that each described object statement is corresponding.
Need to illustrate, other steps in this figure refer to above-mentioned explanation, do not repeat herein.
Particularly, if the intent classifier label of statement to be identified is not unknown class, then it is intended that clear and definite, be the classification that any one is clear and definite, as recommended class, true class, evaluate class, method class, demand class, enumerate class, be non-class, comparing class, reason class, the one described in class, relation object etc.
Determine the similarity between statement to be identified and each candidate's statement, and each similarity and default similarity threshold are compared, candidate's statement corresponding for the maximum similarity exceeding default similarity threshold in each similarity is defined as object statement, and then shows presupposed information corresponding to this objective statement.
Clear and definite feedback information can be shown for the clear and definite statement to be identified of intention in the present embodiment, and candidate's statement corresponding for the maximum similarity exceeding default similarity threshold is defined as object statement, the accuracy of the feedback information provided can be provided.
When the intent classifier label of statement to be identified is unknown class, need to determine object statement in candidate's statement, can be in candidate's statement, select part candidate statement to be object statement, correspondingly, the specific implementation that the candidate's statement in each described grouping is defined as object statement is by step S106:
Determine the similarity of described statement to be identified and each described candidate's statement; Carry out descending sort according to the size of similarity, in each described grouping, choose sequence front and the candidate's statement exceeding the predetermined number of default similarity threshold is object statement.
Particularly, the similarity of the candidate's statement in each grouping and statement to be identified is carried out descending sort according to size, select the forward and candidate's word being above the predetermined number of default similarity threshold as object statement.Predetermined number is such as two, and certainly, this is a kind of example, the number of the statement that can comprise according to statement library and determining, and when while statement number is larger, predetermined number is also larger.
In general, in above-mentioned two embodiments, in candidate's statement during select target statement, in order to improve the accuracy of selection, the similarity of candidate's statement and default similar threshold value are compared, the candidate's statement exceeding default similar threshold value is defined as object statement.Particularly, when the intent classifier label determined is unknown class, similarity in candidate's statement is exceeded default similar threshold value and candidate's statement of the forward predetermined number that sorts is defined as object statement; When the non-unknown class of the intent classifier label determined, similarity in candidate's statement is exceeded default similar threshold value and is defined as object statement for candidate's statement of maximal value.
Above-mentionedly determine that the non-stop words in described statement to be identified is the process of keyword and is: participle is carried out to described statement to be identified, obtains multiple participle word; Remove the stop words in described multiple participle word, obtain keyword.
Wherein, when removing stop words, default inactive vocabulary can be used, removing by the word in participle word in inactive vocabulary, thus obtain keyword.
In addition, when the keyword determined is multiple, in default statement library, the detailed process choosing the candidate's statement comprising described keyword comprises:
Add up the number of keyword in each self-contained statement to be identified of each statement in described default statement library; Number according to the keyword comprised carries out descending sort, and the statement choosing the preceding predetermined number that sorts is candidate's statement.
The mode accuracy that this kind chooses candidate's statement is higher.
Below the disaggregated model built in advance in each embodiment above-mentioned is described.
Obtain and comprise multiple training set having marked statement; Wherein, each described mark statement has respective intention labels and theme label; Utilize and preset training method, described training set is trained, obtain disaggregated model; Wherein, described disaggregated model is used for classifying to the intention of statement to be identified and theme.
First, obtain training set, training set comprises multiple statement marked.The statement of this mark is artificial mark, i.e. the intention labels of artificial read statement and theme label.Wherein, intention labels is in order to show the content type of the feedback information gone for, and such as, for statement " why refrigerator no power ", the intention labels of artificial input is " reason class ", shows that the feedback information gone for is reason; Theme label is in order to show the type of the content that statement is expressed, and such as, the above-mentioned example enumerated, the theme label of artificial input is failure classes, and what show that statement expresses is defect content.
Then, utilize training method, training set is trained, thus obtain disaggregated model.Wherein, training method can be any one training method of the prior art, such as, libsvm instrument can be used to train.It should be noted that, the quantity having marked statement in training set is larger, and the recognition accuracy of disaggregated model is higher.In addition, the disaggregated model of acquisition both can be classified to theme, also can classify to intention.
Utilize the disaggregated model built, determine that the subject classification label of statement to be identified and the detailed process of intent classifier label are:
According to default Feature Words extracting rule, in described statement to be identified, extract multiple characteristic of division; Described multiple characteristic of division is inputed to described disaggregated model, obtains multiple intention probable value and multiple theme probable value; Tag along sort corresponding for maximum intention probable value is defined as the intent classifier label of described statement to be identified, and tag along sort corresponding for maximum theme probable value is defined as the subject classification label of described statement to be identified.
Particularly, in question and answer scene, that presets that the characteristic of division that extracts of Feature Words extracting rule can comprise in following three kinds of features is any one or more, i.e. N tuple word feature, interrogative feature, syntactic feature.Wherein, N unit word feature can be unitary, binary and ternary word feature; Interrogative feature is the feature pair of the part of speech composition of interrogative and subsequent keyword; Syntactic feature refers to the keyword pair depending on predicate verb and interrogative, when keyword depends on predicate verb or interrogative, this dependence is taken out.It should be noted that, the characteristic of division extracted comprise word to, part of speech to, part of speech and word combination to three classes.Such as, for " why refrigerator no power ", the characteristic of division extracted is as shown in table 1 below.
Table 1
Disaggregated model has each type of theme and each intention type that self can identify, utilize the characteristic of division of input, calculate statement to be identified belong to the probable value of each type of theme respectively and belong to the probable value of each intention type respectively, the tag along sort that maximal value (i.e. maximum theme probable value) in each theme probable value is corresponding is defined as the subject classification label of statement to be identified, in like manner, the maximal value (i.e. maximum intention probable value) in each intention probable value is defined as the intent classifier label of statement to be identified.
It should be noted that; in assorting process; interrogative feature is most important for the classification of the statement to be identified of question sentence type; such as; the question sentence comprising interrogative " why " is divided into reason class usually, " how " be usually divided into method class etc., but the usage of interrogative is very flexible; how how for " ", how could question sentence " eliminate the voice in music ", interrogative in " how dance music classification divides " " " position is flexible.In addition, interrogative usually and some noun, verbs etc. to combine co expression query information.
Therefore, the embodiment of the present invention have employed the feature of the part of speech collocation of word before and after interrogative, with " mobile phone how down-load music? " for example, after participle morphology mark, result is: mobile phone/n how/r download/v music/n, extract part of speech before and after interrogative to be characterized as: how, how n--v, n-how-v, and then by <n, how how are >, <, v> and <n, how, v> is as interrogative feature.
In addition, syntactic feature also plays an important role in the assorting process of the statement to be identified of question sentence type.Such as, in " recommending several pleasing to the ear song ", verb " recommendation " classification to whole question sentence judges extremely important.
Therefore, the embodiment of the present invention can adopt interdependent feature, takes out the noun and the interrogative that there are dependence with the predicate verb in question sentence, and composition dependence pair, as syntactic feature.Certainly, in order to the versatility of feature, also add the right part of speech of dependence as syntactic feature.Such as, statement to be identified for " how in music player, changing music format? ", the word of acquisition is to being characterized as: how-conversion, conversion-form; Word and part of speech feature: how-v, conversion-n, r-change, v-form; Part of speech is to being characterized as r-v, v-n.
Certainly, in other scenes, also can use above-mentioned characteristic of division, but interrogative feature can not be needed, or interrogative feature is replaced by other types word feature.
In each embodiment above-mentioned, need the similarity determining statement to be identified and each candidate's statement, deterministic process specifically comprises the following steps:
The semantic similarity, the theme that calculate described statement to be identified and each described candidate's statement are respectively intended to similarity and syntax similarity; Wherein, described semantic similarity is the semantic similarity between the keyword of statement to be identified and the keyword of candidate's statement; Described theme intention similarity is the theme of statement to be identified and intention and the theme of candidate's statement and the similarity of intention; Described syntax similarity is the similarity of the syntactic structure of statement to be identified and the syntactic structure of candidate's statement;
Each for each described candidate's statement self-corresponding described semantic similarity, intention similarity and described syntax similarity are weighted and average, obtain described statement to be identified and each described candidate's statement similarity separately.
Particularly, as shown in Figure 3, the semantic similarity Sim of statement to be identified and each candidate's statement is calculated word(A, B) carries out all in the following manner:
Step S301: each keyword calculating described statement to be identified successively respectively with the Words similarity of each keyword of described candidate's statement, obtain similarity matrix.
Wherein, statement to be identified comprises multiple keyword, can be called as the first keyword, also comprises multiple keyword in candidate's statement, can be called as the second keyword, calculates the Words similarity between each first keyword and each second keyword respectively.Wherein, the mode calculating Words similarity can be any one method of the prior art, e.g., uses the computing method of the semantic relevancy based on " knowing net ".In addition, keyword refers to non-stop words.
Such as, statement to be identified is A, the keyword comprised for (A1, A2 ..., Am), candidate's statement is B, the keyword comprised for (B1, B2 ..., Bn), after calculating the word degree of correlation, acquisition similarity matrix is S a, B.
S A , B = S ( A 1 , B 1 ) S ( A 1 , B 2 ) . . . S ( A 1 , B n ) S ( A 2 , B 1 ) S ( A 2 , B 2 ) . . . S ( A 2 , B n ) . . . . . . . . . . . . S ( A m , B 1 ) S ( A m , B 2 ) . . . S ( A m , B n )
Wherein, S (A i, B j) represent the word degree of correlation of i-th keyword of statement A to be identified and a jth keyword of candidate's statement B.
Step S302: the total value of adding up maximum Words similarity in each row of described similarity matrix, and the row mean value calculating this total value.
Step S303: the total value of adding up maximum Words similarity in each row of described similarity matrix, and the column average value calculating this total value.
Wherein, matrix comprises multiple lines and multiple rows.Line number is identical with the keyword number m of A, and columns is identical with the keyword number n of B, or line number is identical with the keyword number n of B, and columns is identical with the keyword number m of A.
Determine the maximum Words similarity in every a line in matrix, calculate the total value of each maximum Words similarity, and then calculate the mean value Sim (A, B) of total value, by total value divided by line number.In like manner, the mean value Sim (B, A) of the total value of calculated column.
Step S304: the mean value calculating described row mean value and described column average value, obtains the semantic relevancy of described statement to be identified and described candidate's statement.
Wherein, the mean value Sim of two mean values is calculated word(A, B)
Sim word ( A , B ) = Sim ( A , B ) + Sim ( B , A ) 2 .
In addition, the theme calculating statement to be identified and each candidate's statement is intended to similarity Sim style(A, B) carries out all in the following manner:
Judge that whether the subject classification label of described statement to be identified is identical with the preset themes tag along sort of described candidate's statement, obtains the first judged result; Whether the intent classifier label judging described statement to be identified is unknown class, obtains the second judged result; Judge that whether the intent classifier label of described statement to be identified is identical with the default intention labels of described candidate's statement, obtain the 3rd judged result;
When described first judged result for be and described second judged result for being time, determine described theme intention similarity be 1; When described first judged result be yes, described second judged result be no and described 3rd judged result for being time, determine described theme intention similarity be 1; When described first judged result be yes, described second judged result be no and described 3rd judged result be no time, determine described theme intention similarity be greater than 0 and be less than 1 preset value; When described first judged result is no, determine that described theme intention similarity is 0.
It should be noted that, step S104 in above-described embodiment can determine subject classification label and the intent classifier label of statement to be identified, utilizes the subject classification label determined in this step and intent classifier label can determine that the theme of statement to be identified and candidate's statement is intended to similarity.
In addition, above-mentioned three deterministic processes can be carry out simultaneously, also can be that order performs.When order performs, in order to ensure the highest execution efficiency, first can see that whether the theme label of statement to be identified is identical with the theme label of candidate's statement, if theme label is different, then the theme of statement to be identified and candidate's word is intended to similarity and is set to 0, if theme label is identical, then judge that the intent classifier label of statement to be identified be unknown class is also non-unknown class, if unknown class, then the theme of statement to be identified and candidate's word is intended to similarity and is set to 1, if be any one clear and definite intent classifier label of non-unknown class, then judge that whether statement to be identified is identical with the intention labels of candidate's statement, if intention labels is identical, then be set to 1, if intention labels is different, what be set to preset is greater than 0 value being less than 1.
Moreover, calculate the syntax similarity Sim of statement to be identified and each candidate's statement syntax(A, B) carries out all in the following manner:
Syntactic analysis is carried out to described statement to be identified, obtains the first syntactic constituent of described statement to be identified, and obtain second syntactic constituent preset of described candidate's statement; Calculate the first Words similarity of the identical component of described first syntactic constituent and described second syntactic constituent; Calculate the second Words similarity of described first syntactic constituent and the identical ornamental equivalent of described second syntactic constituent; Obtain the default penalty factor of the non-equal composition of described first syntactic constituent and described second syntactic constituent; Utilize described first Words similarity, described second Words similarity and described default penalty factor, calculate weighted mean value, obtain syntax similarity.
It should be noted that, the first syntactic constituent refers to the syntactic constituent in statement to be identified, and the second syntactic constituent is the syntactic constituent in candidate's statement.Wherein, syntax similarity refers to the similarity in the first syntactic constituent and the second syntactic constituent between corresponding syntactic constituent, as the Words similarity between subject, predicate, object and other interdependent compositions.For default composition, need to utilize penalty factor to compensate, finally calculate weighted mean value.The solution formula of weighted mean value is:
Sim syntax ( A , B ) = w S &CenterDot; s s + w V &CenterDot; s V + w O &CenterDot; s O + w A &CenterDot; s A + &Sigma; i = 1 n w R &CenterDot; Sim ( h 1 i , h 2 i ) w S + w V + w O + w A + w R &CenterDot; n - l &CenterDot; PF
Wherein, S s, S v, S o, S abe respectively subject, predicate, object, adverbial modifier's complement similarity; W s, W v, W o, W abe respectively the default weighted value of subject, predicate, object, adverbial modifier's complement; Sim (h 1i, h 2i) be the Words similarity of identical ornamental equivalent, and be other compositions except subject, predicate, object and adverbial modifier's complement, such as, the composition that first syntactic constituent comprises has the modifier of subject " refrigerator ", subject for " Haier ", the composition that second syntactic constituent comprises has subject " refrigerator ", the modifier of subject is " beautiful ", the Words similarity of both calculating; W rfor the default weighted value of identical ornamental equivalent; N is the sum of identical ornamental equivalent; L is the number of non-equal composition, and namely exclusive separately composition number, comprises two parts, be respectively have in the first syntactic constituent but the composition do not had in the second syntactic constituent, and have in the second syntactic constituent but the composition do not had in the first syntactic constituent; PF is for presetting penalty factor.
It should be noted that, when syntactic analysis, syntactic analysis instrument can be used, as LTP syntactic analysis instrument.After syntactic analysis, obtaining syntactic structure is syntactic structure tree.Each node in this syntactic structure tree all has corresponding numbering, when certain syntactic constituent depends on another syntactic constituent, is then the numbering of the syntactic constituent that it depends on by its interdependent Node configuration.Such as shown in table 2, carry out syntactic constituent analysis to statement to be identified, the interdependent node of keyword " refrigerator " is the keyword of 3,3 correspondences is " energising ", illustrates that " refrigerator " depends on " energising ".
Table 2
Numbering Word Part of speech Interdependent node Dependence
0 Why r 3 ADV
1 Refrigerator n 3 SBV
2 No d 3 ADV
3 Energising v -1 HED
4 ? u 3 RAD
It should be noted that, need to mark the part of speech of keyword in syntactic analysis process, namely marking keyword is noun, verb or adverbial word etc., and the part-of-speech tagging instrument of use can be ansj instrument.In addition, the interdependent node that obtains of syntactic analysis and dependence can help to calculate syntax similarity.At calculating syntax similarity Sim syntaxtime (A, B), need to determine identical ornamental equivalent, when determining, utilizing interdependent node to judge whether to depend on same keyword, when being when depending on same keyword, utilizing dependence to judge the ornamental equivalent of keyword.
In addition, when carrying out syntactic analysis, can determine syntactic constituent, syntactic constituent comprises part of speech, interdependent node and dependence etc., by can localizing objects statement more accurately, and then the more accurate information of feedback.Specifically, can be determined the key component of statement to be identified by interdependent node, such as, the key component of the statement to be identified of input " refrigerator how much " is " refrigerator ", the key component of candidate's statement " strip of paper used for sealing of refrigerator how much " is " strip of paper used for sealing ", and both are different.Utilize recognition methods of the prior art, the statement comprising " refrigerator " is directly defined as object statement, inaccurate.But, can determine in the present invention that the two is different, and then also would not be defined as object statement by second, improve the determination accuracy of object statement.
As difference computing semantic similarity Sim word(A, B), theme intention similarity Sim style(A, B) and syntax similarity Sim syntaxafter (A, B), be weighted and average, as calculated statement to be identified and candidate's statement similarity Sim (A, B) separately according to following computing formula.
Sim(A,B)=α×Sim word(A,B)+β×Sim style(A,B)+γ×Sim syntax(A,B)
Wherein, alpha+beta+γ=1.
Below the statement recognition device that the embodiment of the present invention provides is described, it should be noted that, about the explanation of statement recognition device see statement recognition methods provided above, can not repeat herein.
With reference to Fig. 4, it illustrates the structure of the statement recognition device that the embodiment of the present invention provides, specifically comprise:
Statement acquisition module 100 to be identified, for obtaining statement to be identified;
Keyword determination module 200, for determining that the non-stop words in described statement to be identified is keyword;
Candidate's statement acquisition module 300, in default statement library, chooses the candidate's statement comprising described keyword;
Theme and intention determination module 400, for utilizing the disaggregated model built in advance, determine subject classification label and the intent classifier label of described statement to be identified;
Candidate's statement grouping module 500, for when described intent classifier label is unknown class and described candidate's statement is multiple, classifies multiple described candidate's statement according to respective default intention labels, obtains multiple grouping;
Object statement determination module 600, for being defined as object statement corresponding to described statement to be identified by the candidate's statement in each described grouping; Wherein, the preset themes label of described object statement is identical with the subject classification label of described statement to be identified;
Presupposed information display module 700, for showing presupposed information corresponding to each described object statement.
From above technical scheme, the statement recognition device that the embodiment of the present invention provides, can to the statement to be identified got, determine that non-stop words is keyword, in default statement library, choose candidate's statement of the keyword comprising statement to be identified, utilize the disaggregated model built in advance, determine the intent classifier label of statement to be identified, need explanation, disaggregated model can identify the intention of unknown class, the intent classifier label identified be unknown class and candidate's statement is multiple time, according to the intention labels preset, by multiple candidate's statement grouping, presupposed information corresponding for candidate's statement in each grouping is shown.Due to the intention type that different grouping is corresponding different, from each grouping, select candidate's statement as object statement, and then show each self-corresponding presupposed information of each object statement, thus solve the single problem even cannot fed back of feedack.
Alternatively, said apparatus embodiment also comprises: the clear and definite module of intent classifier, for when the non-unknown class of described intent classifier label, determines the similarity of described statement to be identified and each described candidate's statement; Candidate's statement corresponding for the maximum similarity exceeding default similarity threshold is defined as object statement; Show the presupposed information that described object statement is corresponding.
Alternatively, above-mentioned object statement determination module 600 comprises:
Similarity determination submodule, for determining the similarity of described statement to be identified and each described candidate's statement;
First object statement determination submodule, for carrying out descending sort according to the size of similarity, in each described grouping, chooses sequence front and the candidate's statement exceeding the predetermined number of default similarity threshold is object statement.
Alternatively, above-mentioned theme and intention determination module 400 comprise:
Characteristic of division extracts submodule, for according to default Feature Words extracting rule, in described statement to be identified, extracts multiple characteristic of division;
Probable value obtains submodule, for described multiple characteristic of division is inputed to described disaggregated model, obtains multiple intention probable value and multiple theme probable value;
Intention labels determination submodule, for tag along sort corresponding for maximum intention probable value being defined as the intent classifier label of described statement to be identified, and is defined as the subject classification label of described statement to be identified by tag along sort corresponding for maximum theme probable value.
Alternatively, include disaggregated model in above-mentioned device embodiment and build module, described disaggregated model builds module, comprises multiple training set having marked statement for obtaining; Wherein, each described mark statement has respective intention labels and theme label; Utilize and preset training method, described training set is trained, obtain disaggregated model; Wherein, described disaggregated model is used for classifying to the intention of statement to be identified and theme.
Alternatively, the clear and definite module of intent classifier of the above-mentioned similarity for determining described statement to be identified and each described candidate's statement and similarity determination submodule all can comprise:
Similarity calculated, is intended to similarity and syntax similarity for semantic similarity, the theme calculating described statement to be identified and each described candidate's statement respectively; Wherein, described semantic similarity is the semantic similarity between the keyword of statement to be identified and the keyword of candidate's statement; Described theme intention similarity is the theme of statement to be identified and intention and the theme of candidate's statement and the similarity of intention; Described syntax similarity is the similarity of the syntactic structure of statement to be identified and the syntactic structure of candidate's statement;
Weighted mean value computing unit, average for each for each described candidate's statement self-corresponding described semantic similarity, intention similarity and described syntax similarity are weighted, obtain described statement to be identified and each described candidate's statement similarity separately.
Alternatively, the similarity calculated for the semantic similarity calculating described statement to be identified and described candidate's statement comprises:
Similarity calculated, for calculate described statement to be identified successively each keyword respectively with the Words similarity of each keyword of described candidate's statement, obtain similarity matrix; Add up the total value of maximum Words similarity in each row of described similarity matrix, and calculate the row mean value of this total value; Add up the total value of maximum Words similarity in each row of described similarity matrix, and calculate the column average value of this total value; Calculate the mean value of described row mean value and described column average value, obtain the semantic relevancy of described statement to be identified and described candidate's statement.
Alternatively, the similarity calculated that the theme for calculating described statement to be identified and described candidate's statement is intended to similarity comprises:
Similarity calculated, for judging that whether the subject classification label of described statement to be identified is identical with the preset themes tag along sort of described candidate's statement, obtains the first judged result; Whether the intent classifier label judging described statement to be identified is unknown class, obtains the second judged result; Judge that whether the intent classifier label of described statement to be identified is identical with the default intention labels of described candidate's statement, obtain the 3rd judged result; When described first judged result for be and described second judged result for being time, determine described theme intention similarity be 1; When described first judged result be yes, described second judged result be no and described 3rd judged result for being time, determine described theme intention similarity be 1; When described first judged result be yes, described second judged result be no and described 3rd judged result be no time, determine described theme intention similarity be greater than 0 and be less than 1 preset value; When described first judged result is no, determine that described theme intention similarity is 0.
Alternatively, the similarity calculated for the syntax similarity calculating described statement to be identified and described candidate's statement comprises:
Similarity calculated, for carrying out syntactic analysis to described statement to be identified, obtains the first syntactic constituent of described statement to be identified, and obtains second syntactic constituent preset of described candidate's statement; Calculate the first Words similarity of the identical component of described first syntactic constituent and described second syntactic constituent; Calculate the second Words similarity of described first syntactic constituent and the identical ornamental equivalent of described second syntactic constituent; Obtain the default penalty factor of the non-equal composition of described first syntactic constituent and described second syntactic constituent; Utilize described first Words similarity, described second Words similarity and described default penalty factor, calculate weighted mean value, obtain syntax similarity.
Alternatively, when the keyword determined is multiple, above-mentioned candidate's statement acquisition module 300 comprises:
Keyword number statistics submodule, for adding up the number of each self-contained keyword of each statement in described default statement library;
Submodule chosen in candidate's statement, and for carrying out descending sort according to the number of the keyword comprised, the statement choosing the preceding predetermined number that sorts is candidate's statement.
Alternatively, above-mentioned keyword determination module 200 comprises:
Participle word obtains submodule, for carrying out participle to described statement to be identified, obtains multiple participle word;
Stop words removes submodule, for removing the stop words in described multiple participle word, obtains keyword.
It should be noted that, each embodiment in this instructions all adopts the mode of going forward one by one to describe, and what each embodiment stressed is the difference with other embodiments, between each embodiment identical similar part mutually see.
Also it should be noted that, in this article, the such as relational terms of first and second grades and so on is only used for an entity or operation to separate with another entity or operational zone, and not necessarily requires or imply the relation that there is any this reality between these entities or operation or sequentially.And, term " comprises ", " comprising " or its any other variant are intended to contain comprising of nonexcludability, thus make to comprise the process of a series of key element, method, article or equipment and not only comprise those key elements, but also comprise other key elements clearly do not listed, or also comprise by the intrinsic key element of this process, method, article or equipment.When not more restrictions, the key element limited by statement " comprising ... ", and be not precluded within process, method, article or the equipment comprising above-mentioned key element and also there is other identical element.
To the above-mentioned explanation of the disclosed embodiments, professional and technical personnel in the field are realized or uses the present invention.To be apparent for those skilled in the art to the multiple amendment of these embodiments, General Principle as defined herein can without departing from the spirit or scope of the present invention, realize in other embodiments.Therefore, the present invention can not be restricted to these embodiments shown in this article, but will meet the widest scope consistent with principle disclosed herein and features of novelty.

Claims (12)

1. a statement recognition methods, is characterized in that, comprising:
Obtain statement to be identified;
Determine that the non-stop words in described statement to be identified is keyword;
In default statement library, choose the candidate's statement comprising described keyword;
Utilize the disaggregated model built in advance, determine subject classification label and the intent classifier label of described statement to be identified;
When described intent classifier label is unknown class and described candidate's statement is multiple, multiple described candidate's statement is classified according to respective default intention labels, obtains multiple grouping;
Candidate's statement in each described grouping is defined as object statement; Wherein, the preset themes label of described object statement is identical with the subject classification label of described statement to be identified;
Show the presupposed information that each described object statement is corresponding.
2. statement recognition methods according to claim 1, is characterized in that, also comprise:
When the non-unknown class of described intent classifier label, determine the similarity of described statement to be identified and each described candidate's statement;
Candidate's statement corresponding for the maximum similarity exceeding default similarity threshold is defined as object statement;
Show the presupposed information that described object statement is corresponding.
3. statement recognition methods according to claim 1, is characterized in that, describedly candidate's statement in each described grouping is defined as object statement comprises:
Determine the similarity of described statement to be identified and each described candidate's statement;
Carry out descending sort according to the size of similarity, in each described grouping, choose sequence front and the candidate's statement exceeding the predetermined number of default similarity threshold is object statement.
4. statement recognition methods according to claim 1, is characterized in that, the disaggregated model that described utilization builds in advance, determines that the subject classification label of described statement to be identified and intent classifier label comprise:
According to default Feature Words extracting rule, in described statement to be identified, extract multiple characteristic of division;
Described multiple characteristic of division is inputed to described disaggregated model, obtains multiple intention probable value and multiple theme probable value;
Tag along sort corresponding for maximum intention probable value is defined as the intent classifier label of described statement to be identified, and tag along sort corresponding for maximum theme probable value is defined as the subject classification label of described statement to be identified.
5. statement recognition methods according to claim 1, is characterized in that, the building process of described disaggregated model comprises:
Obtain and comprise multiple training set having marked statement; Wherein, each described mark statement has respective intention labels and theme label;
Utilize and preset training method, described training set is trained, obtain disaggregated model; Wherein, described disaggregated model is used for classifying to the intention of statement to be identified and theme.
6. the statement recognition methods according to Claims 2 or 3, is characterized in that, describedly determines that the similarity of described statement to be identified and each described candidate's statement comprises:
The semantic similarity, the theme that calculate described statement to be identified and each described candidate's statement are respectively intended to similarity and syntax similarity; Wherein, described semantic similarity is the semantic similarity between the keyword of statement to be identified and the keyword of candidate's statement; Described theme intention similarity is the theme of statement to be identified and intention and the theme of candidate's statement and the similarity of intention; Described syntax similarity is the similarity of the syntactic structure of statement to be identified and the syntactic structure of candidate's statement;
Each for each described candidate's statement self-corresponding described semantic similarity, intention similarity and described syntax similarity are weighted and average, obtain described statement to be identified and each described candidate's statement similarity separately.
7. statement recognition methods according to claim 6, is characterized in that, the semantic similarity calculating described statement to be identified and described candidate's statement comprises:
The each keyword calculating described statement to be identified successively respectively with the Words similarity of each keyword of described candidate's statement, obtain similarity matrix;
Add up the total value of maximum Words similarity in each row of described similarity matrix, and calculate the row mean value of this total value;
Add up the total value of maximum Words similarity in each row of described similarity matrix, and calculate the column average value of this total value;
Calculate the mean value of described row mean value and described column average value, obtain the semantic relevancy of described statement to be identified and described candidate's statement.
8. statement recognition methods according to claim 6, is characterized in that, the theme calculating described statement to be identified and described candidate's statement is intended to similarity and comprises:
Judge that whether the subject classification label of described statement to be identified is identical with the preset themes tag along sort of described candidate's statement, obtains the first judged result;
Whether the intent classifier label judging described statement to be identified is unknown class, obtains the second judged result;
Judge that whether the intent classifier label of described statement to be identified is identical with the default intention labels of described candidate's statement, obtain the 3rd judged result;
When described first judged result for be and described second judged result for being time, determine described theme intention similarity be 1;
When described first judged result be yes, described second judged result be no and described 3rd judged result for being time, determine described theme intention similarity be 1;
When described first judged result be yes, described second judged result be no and described 3rd judged result be no time, determine described theme intention similarity be greater than 0 and be less than 1 preset value;
When described first judged result is no, determine that described theme intention similarity is 0.
9. statement recognition methods according to claim 6, is characterized in that, the syntax similarity calculating described statement to be identified and described candidate's statement comprises:
Syntactic analysis is carried out to described statement to be identified, obtains the first syntactic constituent of described statement to be identified, and obtain second syntactic constituent preset of described candidate's statement;
Calculate the first Words similarity of the identical component of described first syntactic constituent and described second syntactic constituent;
Calculate the second Words similarity of described first syntactic constituent and the identical ornamental equivalent of described second syntactic constituent;
Obtain the default penalty factor of the non-equal composition of described first syntactic constituent and described second syntactic constituent;
Utilize described first Words similarity, described second Words similarity and described default penalty factor, calculate weighted mean value, obtain syntax similarity.
10. statement recognition methods according to claim 1, is characterized in that, when the keyword determined is multiple, described in default statement library, chooses the candidate's statement comprising described keyword and comprises:
Add up each statement in described default statement library and comprise the number of keyword in statement to be identified;
Number according to the keyword comprised carries out descending sort, and the statement choosing the preceding predetermined number that sorts is candidate's statement.
11. statement recognition methodss according to claim 1, is characterized in that, the described non-stop words determined in described statement to be identified is that keyword comprises:
Participle is carried out to described statement to be identified, obtains multiple participle word;
Remove the stop words in described multiple participle word, obtain keyword.
12. 1 kinds of statement recognition devices, is characterized in that, comprising:
Statement acquisition module to be identified, for obtaining statement to be identified;
Keyword determination module, for determining that the non-stop words in described statement to be identified is keyword;
Candidate's statement acquisition module, in default statement library, chooses the candidate's statement comprising described keyword;
Theme and intention determination module, for utilizing the disaggregated model built in advance, determine subject classification label and the intent classifier label of described statement to be identified;
Candidate's statement grouping module, for when described intent classifier label is unknown class and described candidate's statement is multiple, classifies multiple described candidate's statement according to respective default intention labels, obtains multiple grouping;
Object statement determination module, for being defined as object statement corresponding to described statement to be identified by the candidate's statement in each described grouping; Wherein, the preset themes label of described object statement is identical with the subject classification label of described statement to be identified;
Presupposed information display module, for showing presupposed information corresponding to each described object statement.
CN201510024299.9A 2015-01-16 2015-01-16 Statement identification method and device Active CN104516986B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510024299.9A CN104516986B (en) 2015-01-16 2015-01-16 Statement identification method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510024299.9A CN104516986B (en) 2015-01-16 2015-01-16 Statement identification method and device

Publications (2)

Publication Number Publication Date
CN104516986A true CN104516986A (en) 2015-04-15
CN104516986B CN104516986B (en) 2018-01-16

Family

ID=52792285

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510024299.9A Active CN104516986B (en) 2015-01-16 2015-01-16 Statement identification method and device

Country Status (1)

Country Link
CN (1) CN104516986B (en)

Cited By (51)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105550372A (en) * 2016-01-28 2016-05-04 浪潮软件集团有限公司 Sentence training device and method and information extraction system
CN106294341A (en) * 2015-05-12 2017-01-04 阿里巴巴集团控股有限公司 A kind of Intelligent Answer System and theme method of discrimination thereof and device
CN106383835A (en) * 2016-08-29 2017-02-08 华东师范大学 Natural language knowledge exploration system based on formal semantics reasoning and deep learning
CN106446022A (en) * 2016-08-29 2017-02-22 华东师范大学 Formal semantic reasoning and deep learning-based natural language knowledge mining method
CN106528531A (en) * 2016-10-31 2017-03-22 北京百度网讯科技有限公司 Artificial intelligence-based intention analysis method and apparatus
CN106778862A (en) * 2016-12-12 2017-05-31 上海智臻智能网络科技股份有限公司 A kind of information classification approach and device
CN107193865A (en) * 2017-04-06 2017-09-22 上海奔影网络科技有限公司 Natural language is intended to understanding method and device in man-machine interaction
CN107315731A (en) * 2016-04-27 2017-11-03 北京京东尚科信息技术有限公司 Text similarity computing method
CN107577664A (en) * 2017-08-29 2018-01-12 百度在线网络技术(北京)有限公司 Method and apparatus for display information
CN107657949A (en) * 2017-04-14 2018-02-02 深圳市人马互动科技有限公司 The acquisition methods and device of game data
CN107690781A (en) * 2015-04-16 2018-02-13 三星电子株式会社 Method and apparatus for recommending answer message
CN107688608A (en) * 2017-07-28 2018-02-13 合肥美的智能科技有限公司 Intelligent sound answering method, device, computer equipment and readable storage medium storing program for executing
CN107797981A (en) * 2016-08-31 2018-03-13 科大讯飞股份有限公司 A kind of target text recognition methods and device
CN107977358A (en) * 2017-11-23 2018-05-01 浪潮金融信息技术有限公司 Sentence recognition methods and device, computer-readable storage medium and terminal
CN107992472A (en) * 2017-11-23 2018-05-04 浪潮金融信息技术有限公司 Sentence similarity computational methods and device, computer-readable storage medium and terminal
CN108121721A (en) * 2016-11-28 2018-06-05 渡鸦科技(北京)有限责任公司 Intension recognizing method and device
CN108334490A (en) * 2017-04-07 2018-07-27 腾讯科技(深圳)有限公司 Keyword extracting method and keyword extracting device
CN108363812A (en) * 2018-03-15 2018-08-03 苏州思必驰信息科技有限公司 Realize the method and device marked in advance
CN109213887A (en) * 2018-08-22 2019-01-15 广州海鸥住宅工业股份有限公司 A kind of reminding method and device of negative pressure cabinet storage article
CN109344395A (en) * 2018-08-30 2019-02-15 腾讯科技(深圳)有限公司 A kind of data processing method, device, server and storage medium
CN109344830A (en) * 2018-08-17 2019-02-15 平安科技(深圳)有限公司 Sentence output, model training method, device, computer equipment and storage medium
CN109388705A (en) * 2017-08-07 2019-02-26 芋头科技(杭州)有限公司 A kind of text intent classifier method
CN109582768A (en) * 2018-11-23 2019-04-05 北京搜狗科技发展有限公司 A kind of text entry method and device
CN109710939A (en) * 2018-12-28 2019-05-03 北京百度网讯科技有限公司 Method and apparatus for determining theme
CN109753561A (en) * 2019-01-16 2019-05-14 长安汽车金融有限公司 A kind of generation method automatically replied and device
CN109800326A (en) * 2019-01-24 2019-05-24 广州虎牙信息科技有限公司 A kind of method for processing video frequency, device, equipment and storage medium
CN110020421A (en) * 2018-01-10 2019-07-16 北京京东尚科信息技术有限公司 The session information method of abstracting and system of communication software, equipment and storage medium
CN110163281A (en) * 2019-05-20 2019-08-23 腾讯科技(深圳)有限公司 Statement classification model training method and device
CN110168544A (en) * 2016-12-27 2019-08-23 夏普株式会社 Answering device, the control method of answering device and control program
CN110309289A (en) * 2019-08-23 2019-10-08 深圳市优必选科技股份有限公司 Sentence generation method, sentence generation device and intelligent equipment
CN110378704A (en) * 2019-07-23 2019-10-25 珠海格力电器股份有限公司 Opinion feedback method based on fuzzy recognition, storage medium and terminal equipment
CN110503143A (en) * 2019-08-14 2019-11-26 平安科技(深圳)有限公司 Research on threshold selection, equipment, storage medium and device based on intention assessment
CN110717017A (en) * 2019-10-17 2020-01-21 腾讯科技(深圳)有限公司 Method for processing corpus
CN110852100A (en) * 2019-10-30 2020-02-28 北京大米科技有限公司 Keyword extraction method, keyword extraction device, electronic equipment and medium
CN111126038A (en) * 2019-12-24 2020-05-08 北京明略软件系统有限公司 Information acquisition model generation method and device and information acquisition method and device
CN111145742A (en) * 2019-12-18 2020-05-12 中国人民武装警察部队警官学院 Plan command execution method and system based on voice instruction
CN111178081A (en) * 2018-11-09 2020-05-19 中移(杭州)信息技术有限公司 Semantic recognition method, server, electronic device and computer storage medium
CN111191030A (en) * 2019-12-20 2020-05-22 北京淇瑀信息科技有限公司 Single sentence intention identification method, device and system based on classification
CN111259918A (en) * 2018-11-30 2020-06-09 重庆小雨点小额贷款有限公司 Method and device for labeling intention label, server and storage medium
CN111325037A (en) * 2020-03-05 2020-06-23 苏宁云计算有限公司 Text intention recognition method and device, computer equipment and storage medium
CN111613219A (en) * 2020-05-15 2020-09-01 深圳前海微众银行股份有限公司 Voice data recognition method, apparatus and medium
CN111737425A (en) * 2020-02-28 2020-10-02 北京沃东天骏信息技术有限公司 Response method, response device, server and storage medium
CN111930946A (en) * 2020-08-18 2020-11-13 哈尔滨工程大学 Patent classification method based on similarity measurement
CN112149410A (en) * 2020-08-10 2020-12-29 招联消费金融有限公司 Semantic recognition method and device, computer equipment and storage medium
CN112256845A (en) * 2020-09-14 2021-01-22 北京三快在线科技有限公司 Intention recognition method, device, electronic equipment and computer readable storage medium
CN112396444A (en) * 2019-08-15 2021-02-23 阿里巴巴集团控股有限公司 Intelligent robot response method and device
CN112527965A (en) * 2020-12-18 2021-03-19 国家电网有限公司客户服务中心 Automatic question answering implementation method and device based on combination of professional library and chatting library
US10965622B2 (en) 2015-04-16 2021-03-30 Samsung Electronics Co., Ltd. Method and apparatus for recommending reply message
CN112989839A (en) * 2019-12-18 2021-06-18 中国科学院声学研究所 Keyword feature-based intent recognition method and system embedded in language model
CN113987174A (en) * 2021-10-22 2022-01-28 上海携旅信息技术有限公司 Core statement extraction method, system, equipment and storage medium for classification label
US11557284B2 (en) 2020-01-03 2023-01-17 International Business Machines Corporation Cognitive analysis for speech recognition using multi-language vector representations

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101593204A (en) * 2009-06-05 2009-12-02 北京大学 A kind of emotion tendency analysis system based on news comment webpage
US20110231448A1 (en) * 2010-03-22 2011-09-22 International Business Machines Corporation Device and method for generating opinion pairs having sentiment orientation based impact relations

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101593204A (en) * 2009-06-05 2009-12-02 北京大学 A kind of emotion tendency analysis system based on news comment webpage
US20110231448A1 (en) * 2010-03-22 2011-09-22 International Business Machines Corporation Device and method for generating opinion pairs having sentiment orientation based impact relations

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张彬: ""文本情感倾向性分析与研究"", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (70)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107690781A (en) * 2015-04-16 2018-02-13 三星电子株式会社 Method and apparatus for recommending answer message
US10965622B2 (en) 2015-04-16 2021-03-30 Samsung Electronics Co., Ltd. Method and apparatus for recommending reply message
CN106294341A (en) * 2015-05-12 2017-01-04 阿里巴巴集团控股有限公司 A kind of Intelligent Answer System and theme method of discrimination thereof and device
CN105550372A (en) * 2016-01-28 2016-05-04 浪潮软件集团有限公司 Sentence training device and method and information extraction system
CN107315731A (en) * 2016-04-27 2017-11-03 北京京东尚科信息技术有限公司 Text similarity computing method
CN106383835A (en) * 2016-08-29 2017-02-08 华东师范大学 Natural language knowledge exploration system based on formal semantics reasoning and deep learning
CN106446022A (en) * 2016-08-29 2017-02-22 华东师范大学 Formal semantic reasoning and deep learning-based natural language knowledge mining method
CN107797981A (en) * 2016-08-31 2018-03-13 科大讯飞股份有限公司 A kind of target text recognition methods and device
CN106528531A (en) * 2016-10-31 2017-03-22 北京百度网讯科技有限公司 Artificial intelligence-based intention analysis method and apparatus
CN106528531B (en) * 2016-10-31 2019-09-03 北京百度网讯科技有限公司 Intention analysis method and device based on artificial intelligence
CN108121721A (en) * 2016-11-28 2018-06-05 渡鸦科技(北京)有限责任公司 Intension recognizing method and device
CN106778862A (en) * 2016-12-12 2017-05-31 上海智臻智能网络科技股份有限公司 A kind of information classification approach and device
CN106778862B (en) * 2016-12-12 2020-04-21 上海智臻智能网络科技股份有限公司 Information classification method and device
CN110168544A (en) * 2016-12-27 2019-08-23 夏普株式会社 Answering device, the control method of answering device and control program
CN107193865A (en) * 2017-04-06 2017-09-22 上海奔影网络科技有限公司 Natural language is intended to understanding method and device in man-machine interaction
CN107193865B (en) * 2017-04-06 2020-03-10 上海奔影网络科技有限公司 Natural language intention understanding method and device in man-machine interaction
CN108334490B (en) * 2017-04-07 2021-05-07 腾讯科技(深圳)有限公司 Keyword extraction method and keyword extraction device
CN108334490A (en) * 2017-04-07 2018-07-27 腾讯科技(深圳)有限公司 Keyword extracting method and keyword extracting device
CN107657949A (en) * 2017-04-14 2018-02-02 深圳市人马互动科技有限公司 The acquisition methods and device of game data
CN107688608A (en) * 2017-07-28 2018-02-13 合肥美的智能科技有限公司 Intelligent sound answering method, device, computer equipment and readable storage medium storing program for executing
CN109388705A (en) * 2017-08-07 2019-02-26 芋头科技(杭州)有限公司 A kind of text intent classifier method
CN109388705B (en) * 2017-08-07 2020-05-19 芋头科技(杭州)有限公司 Text intention classification method
CN107577664A (en) * 2017-08-29 2018-01-12 百度在线网络技术(北京)有限公司 Method and apparatus for display information
CN107992472A (en) * 2017-11-23 2018-05-04 浪潮金融信息技术有限公司 Sentence similarity computational methods and device, computer-readable storage medium and terminal
CN107977358A (en) * 2017-11-23 2018-05-01 浪潮金融信息技术有限公司 Sentence recognition methods and device, computer-readable storage medium and terminal
CN110020421A (en) * 2018-01-10 2019-07-16 北京京东尚科信息技术有限公司 The session information method of abstracting and system of communication software, equipment and storage medium
CN108363812A (en) * 2018-03-15 2018-08-03 苏州思必驰信息科技有限公司 Realize the method and device marked in advance
CN109344830A (en) * 2018-08-17 2019-02-15 平安科技(深圳)有限公司 Sentence output, model training method, device, computer equipment and storage medium
CN109213887A (en) * 2018-08-22 2019-01-15 广州海鸥住宅工业股份有限公司 A kind of reminding method and device of negative pressure cabinet storage article
CN109344395B (en) * 2018-08-30 2022-05-20 腾讯科技(深圳)有限公司 Data processing method, device, server and storage medium
CN109344395A (en) * 2018-08-30 2019-02-15 腾讯科技(深圳)有限公司 A kind of data processing method, device, server and storage medium
CN111178081A (en) * 2018-11-09 2020-05-19 中移(杭州)信息技术有限公司 Semantic recognition method, server, electronic device and computer storage medium
CN109582768A (en) * 2018-11-23 2019-04-05 北京搜狗科技发展有限公司 A kind of text entry method and device
CN111259918B (en) * 2018-11-30 2023-06-20 重庆小雨点小额贷款有限公司 Method and device for labeling intention labels, server and storage medium
CN111259918A (en) * 2018-11-30 2020-06-09 重庆小雨点小额贷款有限公司 Method and device for labeling intention label, server and storage medium
CN109710939B (en) * 2018-12-28 2023-06-09 北京百度网讯科技有限公司 Method and device for determining theme
CN109710939A (en) * 2018-12-28 2019-05-03 北京百度网讯科技有限公司 Method and apparatus for determining theme
CN109753561B (en) * 2019-01-16 2021-04-27 长安汽车金融有限公司 Automatic reply generation method and device
CN109753561A (en) * 2019-01-16 2019-05-14 长安汽车金融有限公司 A kind of generation method automatically replied and device
CN109800326B (en) * 2019-01-24 2021-07-02 广州虎牙信息科技有限公司 Video processing method, device, equipment and storage medium
CN109800326A (en) * 2019-01-24 2019-05-24 广州虎牙信息科技有限公司 A kind of method for processing video frequency, device, equipment and storage medium
CN110163281B (en) * 2019-05-20 2024-07-12 腾讯科技(深圳)有限公司 Sentence classification model training method and device
CN110163281A (en) * 2019-05-20 2019-08-23 腾讯科技(深圳)有限公司 Statement classification model training method and device
CN110378704B (en) * 2019-07-23 2021-10-22 珠海格力电器股份有限公司 Opinion feedback method based on fuzzy recognition, storage medium and terminal equipment
CN110378704A (en) * 2019-07-23 2019-10-25 珠海格力电器股份有限公司 Opinion feedback method based on fuzzy recognition, storage medium and terminal equipment
CN110503143B (en) * 2019-08-14 2024-03-19 平安科技(深圳)有限公司 Threshold selection method, device, storage medium and device based on intention recognition
CN110503143A (en) * 2019-08-14 2019-11-26 平安科技(深圳)有限公司 Research on threshold selection, equipment, storage medium and device based on intention assessment
CN112396444A (en) * 2019-08-15 2021-02-23 阿里巴巴集团控股有限公司 Intelligent robot response method and device
CN110309289A (en) * 2019-08-23 2019-10-08 深圳市优必选科技股份有限公司 Sentence generation method, sentence generation device and intelligent equipment
CN110717017A (en) * 2019-10-17 2020-01-21 腾讯科技(深圳)有限公司 Method for processing corpus
CN110852100B (en) * 2019-10-30 2023-07-21 北京大米科技有限公司 Keyword extraction method and device, electronic equipment and medium
CN110852100A (en) * 2019-10-30 2020-02-28 北京大米科技有限公司 Keyword extraction method, keyword extraction device, electronic equipment and medium
CN112989839A (en) * 2019-12-18 2021-06-18 中国科学院声学研究所 Keyword feature-based intent recognition method and system embedded in language model
CN111145742A (en) * 2019-12-18 2020-05-12 中国人民武装警察部队警官学院 Plan command execution method and system based on voice instruction
CN111191030B (en) * 2019-12-20 2024-04-26 北京淇瑀信息科技有限公司 Method, device and system for identifying single sentence intention based on classification
CN111191030A (en) * 2019-12-20 2020-05-22 北京淇瑀信息科技有限公司 Single sentence intention identification method, device and system based on classification
CN111126038A (en) * 2019-12-24 2020-05-08 北京明略软件系统有限公司 Information acquisition model generation method and device and information acquisition method and device
CN111126038B (en) * 2019-12-24 2023-05-23 北京明略软件系统有限公司 Information acquisition model generation method and device and information acquisition method and device
US11557284B2 (en) 2020-01-03 2023-01-17 International Business Machines Corporation Cognitive analysis for speech recognition using multi-language vector representations
CN111737425B (en) * 2020-02-28 2024-03-01 北京汇钧科技有限公司 Response method, device, server and storage medium
CN111737425A (en) * 2020-02-28 2020-10-02 北京沃东天骏信息技术有限公司 Response method, response device, server and storage medium
CN111325037A (en) * 2020-03-05 2020-06-23 苏宁云计算有限公司 Text intention recognition method and device, computer equipment and storage medium
CN111613219A (en) * 2020-05-15 2020-09-01 深圳前海微众银行股份有限公司 Voice data recognition method, apparatus and medium
CN111613219B (en) * 2020-05-15 2023-10-27 深圳前海微众银行股份有限公司 Voice data recognition method, equipment and medium
CN112149410A (en) * 2020-08-10 2020-12-29 招联消费金融有限公司 Semantic recognition method and device, computer equipment and storage medium
CN111930946A (en) * 2020-08-18 2020-11-13 哈尔滨工程大学 Patent classification method based on similarity measurement
CN112256845A (en) * 2020-09-14 2021-01-22 北京三快在线科技有限公司 Intention recognition method, device, electronic equipment and computer readable storage medium
CN112527965A (en) * 2020-12-18 2021-03-19 国家电网有限公司客户服务中心 Automatic question answering implementation method and device based on combination of professional library and chatting library
CN113987174A (en) * 2021-10-22 2022-01-28 上海携旅信息技术有限公司 Core statement extraction method, system, equipment and storage medium for classification label
CN113987174B (en) * 2021-10-22 2024-08-23 上海携旅信息技术有限公司 Method, system, equipment and storage medium for extracting core sentence of classification label

Also Published As

Publication number Publication date
CN104516986B (en) 2018-01-16

Similar Documents

Publication Publication Date Title
CN104516986A (en) Statement identification method and device
Gu et al. " what parts of your apps are loved by users?"(T)
CN104252533B (en) Searching method and searcher
CN104111933B (en) Obtain business object label, set up the method and device of training pattern
CN103678564B (en) Internet product research system based on data mining
CN110309289B (en) Sentence generation method, sentence generation device and intelligent equipment
CN104778209B (en) A kind of opining mining method for millions scale news analysis
US10198506B2 (en) System and method of sentiment data generation
US20080097937A1 (en) Distributed method for integrating data mining and text categorization techniques
CN105844424A (en) Product quality problem discovery and risk assessment method based on network comments
CN105938495A (en) Entity relationship recognition method and apparatus
CN109213925B (en) Legal text searching method
CN103049435A (en) Text fine granularity sentiment analysis method and text fine granularity sentiment analysis device
CN106503184B (en) Determine the method and device of the affiliated class of service of target text
CN104008091A (en) Sentiment value based web text sentiment analysis method
CN103678271B (en) A kind of text correction method and subscriber equipment
CN106570180A (en) Artificial intelligence based voice searching method and device
CN109241332B (en) Method and system for determining semantics through voice
CN105426514A (en) Personalized mobile APP recommendation method
CN106021226A (en) Text abstract generation method and apparatus
CN104866511A (en) Method and equipment for adding multi-media files
CN106897290B (en) Method and device for establishing keyword model
CN109117470B (en) Evaluation relation extraction method and device for evaluating text information
CN101980210A (en) Marked word classifying and grading method and system
CN102081602A (en) Method and equipment for determining category of unlisted word

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant