[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN104464723B - A kind of voice interactive method and system - Google Patents

A kind of voice interactive method and system Download PDF

Info

Publication number
CN104464723B
CN104464723B CN201410782284.4A CN201410782284A CN104464723B CN 104464723 B CN104464723 B CN 104464723B CN 201410782284 A CN201410782284 A CN 201410782284A CN 104464723 B CN104464723 B CN 104464723B
Authority
CN
China
Prior art keywords
voice
word
voice data
prefix word
prefix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410782284.4A
Other languages
Chinese (zh)
Other versions
CN104464723A (en
Inventor
张凯
陈盛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
iFlytek Co Ltd
Original Assignee
iFlytek Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by iFlytek Co Ltd filed Critical iFlytek Co Ltd
Priority to CN201410782284.4A priority Critical patent/CN104464723B/en
Publication of CN104464723A publication Critical patent/CN104464723A/en
Application granted granted Critical
Publication of CN104464723B publication Critical patent/CN104464723B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Telephonic Communication Services (AREA)

Abstract

The invention discloses a kind of voice interactive method and system, this method includes recording the voice data of user's input;End-point detection is carried out to voice data, until detecting speech front-end point;Prefix word detection is carried out to the voice data lighted from speech front-end, until detecting prefix word sound, the prefix word performs the word of type of action for reflection;The voice segments lighted in voice data from the front end of prefix word sound are obtained as phonetic order;Speech recognition is carried out to phonetic order;If voice identification result effectively if perform the operation of corresponding voice identification result.The method and system of the present invention are because the voice segments that will be lighted in voice data from the front end of prefix word sound are as phonetic order, and the word of execution type of action will be reflected as prefix word, it is achieved that the combination between prefix word and phonetic order, it is possible to prevente effectively from occur because force cutting phonetic order bring the problem of can not obtaining efficient voice recognition result, improve the efficiency of interactive voice.

Description

A kind of voice interactive method and system
Technical field
The present invention relates to interactive voice field, more particularly to a kind of voice interactive method and system.
Background technology
In order to avoid the noise of speaking on periphery is mistakenly identified as phonetic order, Yong Hu when standby by the mobile devices such as mobile phone When starting the voice interactive function of mobile device every time, mobile device is required to complete following operate:1st, user's input is recorded Voice data;2nd, obtain voice data and carry out wake-up detection, until waking up successfully;3rd, user is prompted to input language after waking up successfully Sound instructs;4th, after prompting user inputs phonetic order, the voice data of user's input is recorded again;5th, obtain what is recorded again Voice segments in voice data are as phonetic order;6th, speech recognition is carried out to phonetic order, obtains voice identification result;7th, really Whether effective determine voice identification result, voice identification result is performed if effectively.Accordingly, user starts mobile set each During standby voice interactive function, it is required to complete following operate:1st, wake-up word is said, to wake up mobile device;2nd, set in movement When standby prompting user inputs phonetic order, when saying phonetic order, such as saying " phoning Zhang San ".As can be seen here, this kind Voice interactive method has the defects of property easy to use is poor.
In order to solve the problems, such as that property easy to use is poor existing for above-mentioned voice interactive method, also proposed a kind of base at present In the voice interactive method for waking up word, this kind of voice interactive method is that directly processing user is saying wake-up word after waking up successfully The phonetic order continuously said afterwards.Corresponding with this kind of voice interactive method, the operation that user needs to complete is continuously to say to call out Wake up word and phonetic order, for example, for the application " to phone Zhang San ", user needs to say that " language point leads to, and phones Three ", " language point leads to " therein is fixed wake-up word set in advance, and " phoning Zhang San " is phonetic order.This kind Although voice interactive method has certain advantage in property easy to use, user is generally continuously to speak, and wakes up word Can be along connecting together with phonetic order below, therefore, this voice segments using in voice data in waking up successfully are as language The pressure slit mode of sound instruction, it is likely that cause phonetic order imperfect, and then cause sound identification module not had The voice identification result of effect, the recognition accuracy of sound identification module is reduced, this just reduces voice friendship to a certain extent Mutual efficiency.In addition, this kind of voice interactive method works only for fixed wake-up word, user needs hardness memory setting Word is waken up, otherwise will be unable to start whole interactive voice process, therefore, the property easy to use of this kind of voice interactive method still needs Further improve.
The content of the invention
The embodiment of the present invention aims to overcome that interactive voice existing for existing voice exchange method is less efficient and asked A kind of topic, there is provided voice interactive method efficiently based on prefix word.
To achieve the above object, the technical solution adopted by the present invention is:A kind of voice interactive method, including:
Record the voice data of user's input;
End-point detection is carried out to the voice data, until detecting speech front-end point;
Prefix word detection is carried out to the voice data lighted from the speech front-end, until prefix word sound is detected, its In, the prefix word performs the word of type of action, and the prefix word and the voice for showing user view for reflection Instruction is combined together;
Obtain in the voice data from the voice segments that the front end of the prefix word sound is lighted as phonetic order, until Detect that instruction obtains termination event;
Speech recognition is carried out to the phonetic order, obtains voice identification result;
Judge whether institute's speech recognition result is effective, the behaviour of corresponding institute speech recognition result is performed if effectively Make.
Preferably, methods described also includes:
Before end-point detection is carried out to the voice data, noise reduction process is carried out to the voice data.
Preferably, the voice data progress prefix word detection to being lighted from the speech front-end includes:
Based on the parallel search network for including prefix word model and filler model, the sound lighted from the speech front-end is detected Frequency whether there is the prefix word sound in.
Preferably, it is described to judge whether institute's speech recognition result effectively includes:
Judge to whether there is the order word to match with institute speech recognition result in order word network, such as exist, then sentence It is effective to determine institute's speech recognition result.
Preferably, the instruction obtains termination event and included:Institute's speech segment terminates persistently to have set with institute speech segment Fix time.
To achieve these goals, the technical solution adopted by the present invention is:A kind of voice interactive system, including:
Recording module, for recording the voice data of user's input;
Endpoint detection module, for carrying out end-point detection to the voice data, until detecting speech front-end point;
Prefix word detection module, for carrying out prefix word detection to the voice data lighted from the speech front-end, until Prefix word sound is detected, wherein, the prefix word performs the word of type of action for reflection, and the prefix word is with being used for Show that the phonetic order of user view is combined together;
Voice Activity Detection module, for obtaining the language lighted in the voice data from the front end of the prefix word sound Segment is as phonetic order, until detecting that instruction obtains termination event;
Sound identification module, for carrying out speech recognition to the phonetic order, obtain voice identification result;
Judge module, for judging, whether speech recognition result is effective;And
Execution module, for performing operation corresponding to effective voice identification result.
Preferably, the system also includes:
Noise reduction module, it is connected respectively with the recording module and the endpoint detection module, for the recording module The voice data of recording carries out noise reduction process, and sends the voice data after noise reduction process to the endpoint detection module.
Preferably, the prefix word detection module is specifically used for based on parallel including prefix word model and filler model Network is searched for, detects and whether there is the prefix word sound in the voice data lighted from the speech front-end.
Preferably, the judge module is specifically used for judging whether there is and the speech recognition knot in order word network The order word that fruit matches, such as exist, then judge that institute's speech recognition result is effective.
Preferably, the instruction obtains termination event and included:Institute's speech segment terminates persistently to have set with institute speech segment Fix time.
The beneficial effects of the present invention are, voice interactive method of the invention and system due to by voice data from prefix The voice segments that the front end of word sound is lighted as phonetic order, and will e.g. " phoning ", " send short messages to ", " open QQ " The word of type of action is performed as prefix word Deng reflection, it is achieved that the combination between prefix word and phonetic order, this Not only it is possible to prevente effectively from occur because force cutting phonetic order bring the problem of can not obtaining efficient voice recognition result, carry The high efficiency of interactive voice, and this word that will meet conventional language custom is as the mode of prefix word, make user without Need hardness to remember fixed wake-up word, need to only be accustomed to saying the i.e. achievable interactive voice of action for needing to perform according to conventional language Wake-up and action execution, and then further increase the property easy to use of interactive voice.
Brief description of the drawings
Fig. 1 shows a kind of flow chart of embodiment according to voice interactive method of the present invention;
Fig. 2 shows a kind of frame principle figure of implementation structure according to voice interactive system of the present invention.
Embodiment
Embodiments of the invention are described below in detail, the example of the embodiment is shown in the drawings, wherein from beginning to end Same or similar label represents same or similar element or the element with same or like function.Below with reference to attached The embodiment of figure description is exemplary, is only used for explaining the present invention, and is not construed as limiting the claims.
The present invention in order to solve existing for existing voice exchange method because phonetic order is carried out force cutting influence language The problem of sound interactive efficiency, there is provided a kind of highly efficient voice interactive method, as shown in figure 1, this method comprises the following steps:
Step S1:Record the voice data of user's input.
Here, the voice data of recording can be stored in the cyclic buffer of regular length, and record storage address, with The voice data is obtained for subsequent step.
Step S2:End-point detection is carried out to voice data, until detecting speech front-end point.
The speech front-end point is exactly boundary frame of the non-speech segment to voice segments, when carrying out voice data processing, first to sound Then frequency calculates energy feature, energy feature, which exceedes setting numerical value, just to be recognized according to framing is carried out to every frame data of voice data It is voice for the frame data, is otherwise non-voice.
Here, voice data can be constantly stored in cyclic buffer with the progress of recording, and with voice data Continuous storage, can be obtained constantly from cyclic buffer voice data carry out end-point detection, therefore, this enters to voice data The action of row end-point detection synchronous with the action that the voice data of recording is stored in cyclic buffer can be carried out substantially, to carry High treatment efficiency.
Step S3:Prefix word detection is carried out to the voice data lighted from speech front-end, until prefix word sound is detected, Wherein, the prefix word performs the word of type of action for reflection, with can be by for waking up the prefix word of interactive voice with using Organically it is combined together in the phonetic order for showing user view.The word that the reflection performs type of action is, for example, " to beat electricity Talk about to ", " send short messages to ", " open the word that QQ ", " opening wechat " etc. meet conventional language custom.
The main function of prefix word detection is judges whether to wake up interactive voice operation, if detecting prefix word Sound, then start speech recognition, to perform corresponding actions according to user view.
The method of prefix word detection for example may include following steps:
Step S31, acoustic feature extraction:Extract in audio-frequency information (detection of prefix word is carried out generally in units of voice segments) It is with distinction and be the feature based on human hearing characteristic extraction, generally choose the MFCC that uses in speech recognition (Mel-Frequency Cepstrum Coefficient, Mel frequency cepstrum coefficient) feature is as acoustic feature.
Step S32, the detection of prefix word:Obtained acoustic feature will be extracted, examined using the acoustic model of training in prefix word Acoustic score is calculated on survey grid network, if including the prefix word to be detected in the optimal path of acoustic score, it is determined that detected Prefix word, otherwise return to step S31 and continue to extract acoustic feature.
, can also be it is determined that in order to reduce the false drop rate of prefix word on the basis of above-mentioned steps S31 and step S32 Following steps S33 is performed after detecting prefix word.
Step S33, prefix word confirm:Obtained acoustic feature will be extracted, it is true in prefix word using the acoustic model of training Recognize progress prefix word confirmation on network, obtain finally confirming score;Whether the prefix word for judging the detection is real prefix word, Will the prefix word final confirmation score and thresholding set in advance be compared, if finally confirm score be more than or equal to door Limit, then it is assumed that the prefix word is real prefix word, and voice wakes up successfully;If finally confirm that score is less than thresholding, then it is assumed that The prefix word is false prefix word, comes back to step S31 and continues to extract acoustic feature.
Here, the word increase that the reflection for meeting conventional language custom can be performed to type of action detects network in prefix word Confirm with prefix word in network, in addition, the method for the present invention also supports user that according to personal speech habits, reflection is performed into action The word increase of type detects network in prefix word and prefix word confirms the operation of network.This cause the present invention method no longer by It is limited to the fixed application convenience for waking up word, further increasing the present invention.
The implementation method of above-mentioned prefix word detection network can draw using optimal score path computing, the optimal sub-path that obtains Calculation formula is:
Current X represents the acoustic feature vector extracted from voice data, and W represents the maximum optimal word sequence of score;Bar Part probability P (X | W) it is acoustic model scores, it is calculated by the acoustic model trained;Prior probability P (W) is language mould Type score, it is full probability as to the PenaltyP (X) added by different acoustic models, when acoustic model and prefix word detection net Network is definite value after deciding.On this basis, prefix word confirms that the implementation method of network is:
A) the prefix word of detection is decoded to phoneme one-level, and records all scores:
(Scorephone1,Scorephone2,…,ScorephoneN), wherein N is phoneme number total in prefix word, Scorephone1,Scorephone2,…,ScorephoneNThe decoding score of each phoneme in the prefix word is represented respectively.
B) each phoneme of prefix word is calculated really to recognize point, calculation is as follows:
Wherein KistartAnd KiendThe initial time of respectively i-th phoneme and end time;CMphoneiRepresent i-th of sound Element is recognized point really, subscript phonei i-th of phoneme of expression, ScorephoneiThe decoding score of i-th of phoneme as shown above, ScoreframekRepresent the score of kth frame obtained using prefix word confirmation network decoding.
C) the final confirmation score C M of the prefix word is calculatedword, calculation is as follows:
In order to improve prefix word detection efficiency and the degree of accuracy, the training of above-mentioned acoustic model can be divided into two parts, be respectively Prefix word model and filler model (i.e. filler models);Prefix word model can use the acoustic model in traditional speech recognition Training method, choose database, using based on MLE (Maximum Likelihood Estimation, maximal possibility estimation) and Obtained under MPE (Minimum Phone Error, minimum phoneme mistake) distinction training criterion;And filler model is then used to inhale Receive the independent voice in addition to prefix word.Therefore, prefix is carried out to the voice data lighted from speech front-end in above-mentioned steps S3 Word detection can further comprise:Based on the parallel search network for including prefix word model and filler model, detect from speech front-end It whether there is prefix word sound in the voice data lighted.
It will be understood by those skilled in the art that the present invention can also use interactive voice field in usually use its He detects prefix word sound at words detection means, and this embodiment of the present invention is not limited.
Step S4:Obtain in voice data from the voice segments that the front end of prefix word sound is lighted as phonetic order, until Detect that instruction obtains termination event, to realize the combination of prefix word and phonetic order.
Here, step S1 operation continues un-interrupted after prefix word sound (waking up successfully) is detected, and The action for obtaining phonetic order is successfully triggered by waking up, and the step is directly to be obtained after waking up successfully from cyclic buffer Voice segments in voice data.
, can be after prefix word sound be detected for the ease of obtaining the voice segments, the aft terminal for recording prefix word sound exists The length of storage address and prefix word sound in cyclic buffer, so, you can the forward terminal of prefix word sound is calculated Storage address in cyclic buffer, so as to accurately obtain the language lighted in voice data from the front end of prefix word sound Segment.
Step S5:Speech recognition is carried out to phonetic order, obtains voice identification result.
Step S6:Judge whether voice identification result is effective, the operation of corresponding voice identification result is performed if effectively; Terminate this interactive voice if invalid, here, may remind the user that interactive failure, and remind user to input again correctly Phonetic order.
The voice interactive method of the present invention is due to the voice segments lighted in voice data from the front end of prefix word sound being made For phonetic order, and the word for performing type of action using reflecting is as prefix word, it is achieved that between prefix word and phonetic order Combination, this not only it is possible to prevente effectively from occur because force cutting phonetic order bring can not obtain efficient voice identification As a result the problem of, the efficiency of interactive voice is improved, and this word of conventional language custom that will meet is as prefix word Mode, make the wake-up word that user is fixed without hardness memory, only need to be accustomed to saying according to conventional language needs the action performed i.e. The execution of wake-up and the action of interactive voice can be achieved, and then further increase the property easy to use of interactive voice.
In order to improve the degree of accuracy of forward terminal detection, the detection of prefix word and speech recognition, and improve interactive voice of the present invention The antijamming capability of method, method of the invention can also be entered before end-point detection is carried out to voice data to voice data Row noise reduction process, clean voice data is obtained, on the other hand, above-mentioned steps S3 is specifically the clean audio number to being lighted from speech front-end According to prefix word detection is carried out, above-mentioned steps S4 is specifically the language for obtaining and being lighted in clean voice data from the front end of prefix word sound Segment is as phonetic order.
Judge whether institute's speech recognition result can effectively further comprise following steps in above-mentioned steps S6:
Step S61:Loading command word network.
The method of the present invention supports user to expand the operation of order word network as needed.
Step S62:Judge such as exist with the presence or absence of the order word to match with voice identification result in order word network, Then judge that institute's speech recognition result is effective.
Here, can by calculate the similarity between voice identification result and each order word obtain voice identification result with it is each Matching degree score between order word, if matching degree score is greater than given threshold, then it is assumed that voice identification result Effectively, otherwise it is assumed that sound result is invalid.
Above-mentioned instruction obtains termination event and can set as needed, such as including:Voice segments terminate to have continued with voice segments Setting time.Therefore, can be simultaneously to being lighted from the front end of prefix word sound in voice data after prefix word sound is detected Voice segments carry out speech recognition, aft terminal detection and duration timing.Those skilled in the art can be according to practical application field Close and the setting time be arranged to fixed value, or the setting time is arranged to be inputted by user and determined, it is generally the case that The setting time selects in the range of 800ms to 2000ms, such as selection is 1000ms.Upper speech segment sign-off table shows detection To the aft terminal of voice segments.If aft terminal is also not detected by when voice segments continue setting time, it also hold that voice segments Terminate.Here, the beginning and end of each voice segments corresponds to the forward terminal and aft terminal of voice segments respectively, forward terminal is just non-language Segment is to the boundary frame of voice segments, and aft terminal is exactly boundary frame of the voice segments to non-speech segment, and therefore, voice segments are continuous certain The frame data of length all meet what the requirement of voice obtained.
It is corresponding with above-mentioned voice interactive method, voice interactive system of the invention as shown in Fig. 2 including recording module 1, Endpoint detection module 2, prefix word detection module 3, Voice Activity Detection module 4, sound identification module 5, judge module 6, execution Module 7, the recording module 1 are used for the voice data for recording user's input;The endpoint detection module 2 is used for the voice data End-point detection is carried out, until detecting speech front-end point;The prefix word detection module 3 is used for being lighted from the speech front-end Voice data carries out prefix word detection, until prefix word sound is detected, wherein, the prefix word performs type of action for reflection Word;The Voice Activity Detection module 4 is used to obtain what is lighted from the front end of the prefix word sound in the voice data Voice segments are as phonetic order, until detecting that instruction obtains termination event;The sound identification module 5 is used to refer to the voice Order carries out speech recognition, obtains voice identification result;The judge module 6 is used to judge whether institute's speech recognition result is effective; The execution module 7 is used to perform effective voice identification result.
The present invention system can also further comprise noise reduction module (not shown), the noise reduction module respectively with record mould Block 1 and endpoint detection module 2 are connected, and noise reduction process is carried out for the voice data recorded to recording module 1, and by noise reduction process Voice data afterwards sends endpoint detection module 2 to.
Further, above-mentioned prefix word detection module 3 can be additionally used in based on including prefix word model and filler model and Row search network, detects and whether there is the prefix word sound in the voice data lighted from the speech front-end.
Further, above-mentioned judge module 6 can also be used to judge in order word network to whether there is and the speech recognition As a result the order word to match, such as exist, then judge that institute's speech recognition result is effective.
Above-mentioned instruction, which obtains termination event, for example may include that voice segments terminate to continue setting time with voice segments, on the other hand, Above-mentioned endpoint detection module 2 can be additionally used in the duration for the aft terminal and voice segments for detecting the voice segments.
Each embodiment in this specification is described by the way of progressive, identical similar portion between each embodiment Divide mutually referring to what each embodiment stressed is the difference with other embodiment.It is real especially for system For applying example, because it is substantially similar to embodiment of the method, so describing fairly simple, related part is referring to embodiment of the method Part explanation.System embodiment described above is only schematical, wherein described be used as separating component explanation Module or unit can be or may not be it is physically separate, can be as the part that module or unit are shown or Person may not be physical location, you can with positioned at a place, or can also be distributed on multiple NEs.Can root Factually border needs to select some or all of module therein realize the purpose of this embodiment scheme.Ordinary skill Personnel are without creative efforts, you can to understand and implement.
Construction, feature and the action effect of the present invention, above institute is described in detail according to the embodiment shown in schema above Only presently preferred embodiments of the present invention is stated, but the present invention is not to limit practical range shown in drawing, it is every according to structure of the invention Want made change, or be revised as the equivalent embodiment of equivalent variations, when still without departing from specification and illustrating covered spirit, All should be within the scope of the present invention.

Claims (10)

  1. A kind of 1. voice interactive method, it is characterised in that including:
    Record the voice data of user's input;
    End-point detection is carried out to the voice data, until detecting speech front-end point;
    Prefix word detection is carried out to the voice data lighted from the speech front-end, until prefix word sound is detected, wherein, institute State the word that prefix word performs type of action for reflection, and the prefix word and the phonetic order knot for showing user view It is combined;
    Obtain in the voice data from the voice segments that the front end of the prefix word sound is lighted as phonetic order, until detection Termination event is obtained to instruction;
    Speech recognition is carried out to the phonetic order, obtains voice identification result;
    Judge whether institute's speech recognition result is effective, the operation of corresponding institute speech recognition result is performed if effectively.
  2. 2. according to the method for claim 1, it is characterised in that methods described also includes:
    Before end-point detection is carried out to the voice data, noise reduction process is carried out to the voice data.
  3. 3. according to the method for claim 1, it is characterised in that the voice data to being lighted from the speech front-end enters The detection of row prefix word includes:
    Based on the parallel search network for including prefix word model and filler model, the audio number lighted from the speech front-end is detected It whether there is the prefix word sound in.
  4. 4. according to the method for claim 1, it is characterised in that described to judge whether institute's speech recognition result effectively wraps Include:
    Judge to whether there is the order word to match with institute speech recognition result in order word network, such as exist, then judge institute Speech recognition result is effective.
  5. 5. voice interactive method according to any one of claim 1 to 4, it is characterised in that the instruction, which obtains, to be terminated Event includes:Institute's speech segment terminates to continue setting time with institute speech segment.
  6. A kind of 6. voice interactive system, it is characterised in that including:
    Recording module, for recording the voice data of user's input;
    Endpoint detection module, for carrying out end-point detection to the voice data, until detecting speech front-end point;
    Prefix word detection module, for carrying out prefix word detection to the voice data lighted from the speech front-end, until detection To prefix word sound, wherein, the prefix word performs the word of type of action for reflection, and the prefix word shows with being used for The phonetic order of user view is combined together;
    Voice Activity Detection module, for obtaining the voice segments lighted in the voice data from the front end of the prefix word sound As phonetic order, until detecting that instruction obtains termination event;
    Sound identification module, for carrying out speech recognition to the phonetic order, obtain voice identification result;
    Judge module, for judging, whether speech recognition result is effective;And
    Execution module, for performing operation corresponding to effective voice identification result.
  7. 7. system according to claim 6, it is characterised in that the system also includes:
    Noise reduction module, it is connected respectively with the recording module and the endpoint detection module, for being recorded to the recording module Voice data carry out noise reduction process, and send the voice data after noise reduction process to the endpoint detection module.
  8. 8. system according to claim 6, it is characterised in that the prefix word detection module is specifically used for based on before including Sew the parallel search network of word model and filler model, detect and whether there is institute in the voice data lighted from the speech front-end State prefix word sound.
  9. 9. system according to claim 6, it is characterised in that the judge module is specifically used for judging in order word network With the presence or absence of the order word to match with institute speech recognition result, such as exist, then judge that institute's speech recognition result is effective.
  10. 10. the system according to any one of claim 6 to 9, it is characterised in that the instruction, which obtains, terminates event package Include:Institute's speech segment terminates to continue setting time with institute speech segment.
CN201410782284.4A 2014-12-16 2014-12-16 A kind of voice interactive method and system Active CN104464723B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410782284.4A CN104464723B (en) 2014-12-16 2014-12-16 A kind of voice interactive method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410782284.4A CN104464723B (en) 2014-12-16 2014-12-16 A kind of voice interactive method and system

Publications (2)

Publication Number Publication Date
CN104464723A CN104464723A (en) 2015-03-25
CN104464723B true CN104464723B (en) 2018-03-20

Family

ID=52910674

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410782284.4A Active CN104464723B (en) 2014-12-16 2014-12-16 A kind of voice interactive method and system

Country Status (1)

Country Link
CN (1) CN104464723B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109708256A (en) * 2018-12-06 2019-05-03 珠海格力电器股份有限公司 Voice determination method and device, storage medium and air conditioner

Families Citing this family (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106782547B (en) * 2015-11-23 2020-08-07 芋头科技(杭州)有限公司 Robot semantic recognition system based on voice recognition
CN105529028B (en) * 2015-12-09 2019-07-30 百度在线网络技术(北京)有限公司 Speech analysis method and apparatus
CN106887227A (en) * 2015-12-16 2017-06-23 芋头科技(杭州)有限公司 A kind of voice awakening method and system
CN105869637B (en) 2016-05-26 2019-10-15 百度在线网络技术(北京)有限公司 Voice awakening method and device
CN105931639B (en) * 2016-05-31 2019-09-10 杨若冲 A kind of voice interactive method for supporting multistage order word
CN106157950A (en) * 2016-09-29 2016-11-23 合肥华凌股份有限公司 Speech control system and awakening method, Rouser and household electrical appliances, coprocessor
CN106653013B (en) * 2016-09-30 2019-12-20 北京奇虎科技有限公司 Voice recognition method and device
CN106571144A (en) * 2016-11-08 2017-04-19 广东小天才科技有限公司 Search method and device based on voice recognition
CN107145329A (en) * 2017-04-10 2017-09-08 北京猎户星空科技有限公司 Apparatus control method, device and smart machine
US10311874B2 (en) 2017-09-01 2019-06-04 4Q Catalyst, LLC Methods and systems for voice-based programming of a voice-controlled device
CN107731226A (en) * 2017-09-29 2018-02-23 杭州聪普智能科技有限公司 Control method, device and electronic equipment based on speech recognition
CN108172219B (en) * 2017-11-14 2021-02-26 珠海格力电器股份有限公司 Method and device for recognizing voice
CN107886944B (en) * 2017-11-16 2021-12-31 出门问问创新科技有限公司 Voice recognition method, device, equipment and storage medium
CN107919124B (en) * 2017-12-22 2021-07-13 北京小米移动软件有限公司 Equipment awakening method and device
CN110299137B (en) * 2018-03-22 2023-12-12 腾讯科技(深圳)有限公司 Voice interaction method and device
CN108735210A (en) * 2018-05-08 2018-11-02 宇龙计算机通信科技(深圳)有限公司 A kind of sound control method and terminal
WO2019222996A1 (en) * 2018-05-25 2019-11-28 Beijing Didi Infinity Technology And Development Co., Ltd. Systems and methods for voice recognition
CN108922531B (en) * 2018-07-26 2020-10-27 腾讯科技(北京)有限公司 Slot position identification method and device, electronic equipment and storage medium
CN109147779A (en) * 2018-08-14 2019-01-04 苏州思必驰信息科技有限公司 Voice data processing method and device
JP6992713B2 (en) * 2018-09-11 2022-01-13 日本電信電話株式会社 Continuous utterance estimation device, continuous utterance estimation method, and program
CN109147764A (en) * 2018-09-20 2019-01-04 百度在线网络技术(北京)有限公司 Voice interactive method, device, equipment and computer-readable medium
CN108962250A (en) * 2018-09-26 2018-12-07 出门问问信息科技有限公司 Audio recognition method, device and electronic equipment
CN111063356B (en) * 2018-10-17 2023-05-09 北京京东尚科信息技术有限公司 Electronic equipment response method and system, sound box and computer readable storage medium
CN109887493B (en) * 2019-03-13 2021-08-31 安徽声讯信息技术有限公司 Character audio pushing method
CN110930989B (en) * 2019-11-27 2021-04-06 深圳追一科技有限公司 Speech intention recognition method and device, computer equipment and storage medium
CN111524512A (en) * 2020-04-14 2020-08-11 苏州思必驰信息科技有限公司 Method for starting one-shot voice conversation with low delay, peripheral equipment and voice interaction device with low delay response
CN113643691A (en) * 2021-08-16 2021-11-12 思必驰科技股份有限公司 Far-field voice message interaction method and system
CN113971953A (en) * 2021-09-17 2022-01-25 珠海格力电器股份有限公司 Voice command word recognition method and device, storage medium and electronic equipment

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102999161B (en) * 2012-11-13 2016-03-02 科大讯飞股份有限公司 A kind of implementation method of voice wake-up module and application
CN103077165A (en) * 2012-12-31 2013-05-01 威盛电子股份有限公司 Natural language dialogue method and system thereof
CN103220423A (en) * 2013-04-10 2013-07-24 威盛电子股份有限公司 Voice answering method and mobile terminal device
CN103595869A (en) * 2013-11-15 2014-02-19 华为终端有限公司 Terminal voice control method and device and terminal
CN103632667B (en) * 2013-11-25 2017-08-04 华为技术有限公司 acoustic model optimization method, device and voice awakening method, device and terminal
CN103714815A (en) * 2013-12-09 2014-04-09 何永 Voice control method and device thereof

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109708256A (en) * 2018-12-06 2019-05-03 珠海格力电器股份有限公司 Voice determination method and device, storage medium and air conditioner
CN109708256B (en) * 2018-12-06 2020-07-03 珠海格力电器股份有限公司 Voice determination method and device, storage medium and air conditioner

Also Published As

Publication number Publication date
CN104464723A (en) 2015-03-25

Similar Documents

Publication Publication Date Title
CN104464723B (en) A kind of voice interactive method and system
KR102134201B1 (en) Method, apparatus, and storage medium for constructing speech decoding network in numeric speech recognition
CN107767863B (en) Voice awakening method and system and intelligent terminal
CN107767861B (en) Voice awakening method and system and intelligent terminal
CN103021409B (en) A kind of vice activation camera system
CN103943105A (en) Voice interaction method and system
US9286897B2 (en) Speech recognizer with multi-directional decoding
US9354687B2 (en) Methods and apparatus for unsupervised wakeup with time-correlated acoustic events
US9070367B1 (en) Local speech recognition of frequent utterances
US9437186B1 (en) Enhanced endpoint detection for speech recognition
CN103095911B (en) Method and system for finding mobile phone through voice awakening
CN103544955B (en) Identify the method and its electronic device of voice
CN109979474B (en) Voice equipment and user speech rate correction method and device thereof and storage medium
CN104036774A (en) Method and system for recognizing Tibetan dialects
US9335966B2 (en) Methods and apparatus for unsupervised wakeup
CN105336324A (en) Language identification method and device
Shriberg et al. Learning When to Listen: Detecting System-Addressed Speech in Human-Human-Computer Dialog.
KR20170139650A (en) Method for adding accounts, terminals, servers, and computer storage media
Kim et al. Multistage data selection-based unsupervised speaker adaptation for personalized speech emotion recognition
CN112102850A (en) Processing method, device and medium for emotion recognition and electronic equipment
US20170032778A1 (en) Method, apparatus, and computer-readable recording medium for improving at least one semantic unit set
JP6915637B2 (en) Information processing equipment, information processing methods, and programs
US10417345B1 (en) Providing customer service agents with customer-personalized result of spoken language intent
CN105869622B (en) Chinese hot word detection method and device
CN110853669B (en) Audio identification method, device and equipment

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant