CN104464723B - A kind of voice interactive method and system - Google Patents
A kind of voice interactive method and system Download PDFInfo
- Publication number
- CN104464723B CN104464723B CN201410782284.4A CN201410782284A CN104464723B CN 104464723 B CN104464723 B CN 104464723B CN 201410782284 A CN201410782284 A CN 201410782284A CN 104464723 B CN104464723 B CN 104464723B
- Authority
- CN
- China
- Prior art keywords
- voice
- word
- voice data
- prefix word
- prefix
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Landscapes
- Telephonic Communication Services (AREA)
Abstract
The invention discloses a kind of voice interactive method and system, this method includes recording the voice data of user's input;End-point detection is carried out to voice data, until detecting speech front-end point;Prefix word detection is carried out to the voice data lighted from speech front-end, until detecting prefix word sound, the prefix word performs the word of type of action for reflection;The voice segments lighted in voice data from the front end of prefix word sound are obtained as phonetic order;Speech recognition is carried out to phonetic order;If voice identification result effectively if perform the operation of corresponding voice identification result.The method and system of the present invention are because the voice segments that will be lighted in voice data from the front end of prefix word sound are as phonetic order, and the word of execution type of action will be reflected as prefix word, it is achieved that the combination between prefix word and phonetic order, it is possible to prevente effectively from occur because force cutting phonetic order bring the problem of can not obtaining efficient voice recognition result, improve the efficiency of interactive voice.
Description
Technical field
The present invention relates to interactive voice field, more particularly to a kind of voice interactive method and system.
Background technology
In order to avoid the noise of speaking on periphery is mistakenly identified as phonetic order, Yong Hu when standby by the mobile devices such as mobile phone
When starting the voice interactive function of mobile device every time, mobile device is required to complete following operate:1st, user's input is recorded
Voice data;2nd, obtain voice data and carry out wake-up detection, until waking up successfully;3rd, user is prompted to input language after waking up successfully
Sound instructs;4th, after prompting user inputs phonetic order, the voice data of user's input is recorded again;5th, obtain what is recorded again
Voice segments in voice data are as phonetic order;6th, speech recognition is carried out to phonetic order, obtains voice identification result;7th, really
Whether effective determine voice identification result, voice identification result is performed if effectively.Accordingly, user starts mobile set each
During standby voice interactive function, it is required to complete following operate:1st, wake-up word is said, to wake up mobile device;2nd, set in movement
When standby prompting user inputs phonetic order, when saying phonetic order, such as saying " phoning Zhang San ".As can be seen here, this kind
Voice interactive method has the defects of property easy to use is poor.
In order to solve the problems, such as that property easy to use is poor existing for above-mentioned voice interactive method, also proposed a kind of base at present
In the voice interactive method for waking up word, this kind of voice interactive method is that directly processing user is saying wake-up word after waking up successfully
The phonetic order continuously said afterwards.Corresponding with this kind of voice interactive method, the operation that user needs to complete is continuously to say to call out
Wake up word and phonetic order, for example, for the application " to phone Zhang San ", user needs to say that " language point leads to, and phones
Three ", " language point leads to " therein is fixed wake-up word set in advance, and " phoning Zhang San " is phonetic order.This kind
Although voice interactive method has certain advantage in property easy to use, user is generally continuously to speak, and wakes up word
Can be along connecting together with phonetic order below, therefore, this voice segments using in voice data in waking up successfully are as language
The pressure slit mode of sound instruction, it is likely that cause phonetic order imperfect, and then cause sound identification module not had
The voice identification result of effect, the recognition accuracy of sound identification module is reduced, this just reduces voice friendship to a certain extent
Mutual efficiency.In addition, this kind of voice interactive method works only for fixed wake-up word, user needs hardness memory setting
Word is waken up, otherwise will be unable to start whole interactive voice process, therefore, the property easy to use of this kind of voice interactive method still needs
Further improve.
The content of the invention
The embodiment of the present invention aims to overcome that interactive voice existing for existing voice exchange method is less efficient and asked
A kind of topic, there is provided voice interactive method efficiently based on prefix word.
To achieve the above object, the technical solution adopted by the present invention is:A kind of voice interactive method, including:
Record the voice data of user's input;
End-point detection is carried out to the voice data, until detecting speech front-end point;
Prefix word detection is carried out to the voice data lighted from the speech front-end, until prefix word sound is detected, its
In, the prefix word performs the word of type of action, and the prefix word and the voice for showing user view for reflection
Instruction is combined together;
Obtain in the voice data from the voice segments that the front end of the prefix word sound is lighted as phonetic order, until
Detect that instruction obtains termination event;
Speech recognition is carried out to the phonetic order, obtains voice identification result;
Judge whether institute's speech recognition result is effective, the behaviour of corresponding institute speech recognition result is performed if effectively
Make.
Preferably, methods described also includes:
Before end-point detection is carried out to the voice data, noise reduction process is carried out to the voice data.
Preferably, the voice data progress prefix word detection to being lighted from the speech front-end includes:
Based on the parallel search network for including prefix word model and filler model, the sound lighted from the speech front-end is detected
Frequency whether there is the prefix word sound in.
Preferably, it is described to judge whether institute's speech recognition result effectively includes:
Judge to whether there is the order word to match with institute speech recognition result in order word network, such as exist, then sentence
It is effective to determine institute's speech recognition result.
Preferably, the instruction obtains termination event and included:Institute's speech segment terminates persistently to have set with institute speech segment
Fix time.
To achieve these goals, the technical solution adopted by the present invention is:A kind of voice interactive system, including:
Recording module, for recording the voice data of user's input;
Endpoint detection module, for carrying out end-point detection to the voice data, until detecting speech front-end point;
Prefix word detection module, for carrying out prefix word detection to the voice data lighted from the speech front-end, until
Prefix word sound is detected, wherein, the prefix word performs the word of type of action for reflection, and the prefix word is with being used for
Show that the phonetic order of user view is combined together;
Voice Activity Detection module, for obtaining the language lighted in the voice data from the front end of the prefix word sound
Segment is as phonetic order, until detecting that instruction obtains termination event;
Sound identification module, for carrying out speech recognition to the phonetic order, obtain voice identification result;
Judge module, for judging, whether speech recognition result is effective;And
Execution module, for performing operation corresponding to effective voice identification result.
Preferably, the system also includes:
Noise reduction module, it is connected respectively with the recording module and the endpoint detection module, for the recording module
The voice data of recording carries out noise reduction process, and sends the voice data after noise reduction process to the endpoint detection module.
Preferably, the prefix word detection module is specifically used for based on parallel including prefix word model and filler model
Network is searched for, detects and whether there is the prefix word sound in the voice data lighted from the speech front-end.
Preferably, the judge module is specifically used for judging whether there is and the speech recognition knot in order word network
The order word that fruit matches, such as exist, then judge that institute's speech recognition result is effective.
Preferably, the instruction obtains termination event and included:Institute's speech segment terminates persistently to have set with institute speech segment
Fix time.
The beneficial effects of the present invention are, voice interactive method of the invention and system due to by voice data from prefix
The voice segments that the front end of word sound is lighted as phonetic order, and will e.g. " phoning ", " send short messages to ", " open QQ "
The word of type of action is performed as prefix word Deng reflection, it is achieved that the combination between prefix word and phonetic order, this
Not only it is possible to prevente effectively from occur because force cutting phonetic order bring the problem of can not obtaining efficient voice recognition result, carry
The high efficiency of interactive voice, and this word that will meet conventional language custom is as the mode of prefix word, make user without
Need hardness to remember fixed wake-up word, need to only be accustomed to saying the i.e. achievable interactive voice of action for needing to perform according to conventional language
Wake-up and action execution, and then further increase the property easy to use of interactive voice.
Brief description of the drawings
Fig. 1 shows a kind of flow chart of embodiment according to voice interactive method of the present invention;
Fig. 2 shows a kind of frame principle figure of implementation structure according to voice interactive system of the present invention.
Embodiment
Embodiments of the invention are described below in detail, the example of the embodiment is shown in the drawings, wherein from beginning to end
Same or similar label represents same or similar element or the element with same or like function.Below with reference to attached
The embodiment of figure description is exemplary, is only used for explaining the present invention, and is not construed as limiting the claims.
The present invention in order to solve existing for existing voice exchange method because phonetic order is carried out force cutting influence language
The problem of sound interactive efficiency, there is provided a kind of highly efficient voice interactive method, as shown in figure 1, this method comprises the following steps:
Step S1:Record the voice data of user's input.
Here, the voice data of recording can be stored in the cyclic buffer of regular length, and record storage address, with
The voice data is obtained for subsequent step.
Step S2:End-point detection is carried out to voice data, until detecting speech front-end point.
The speech front-end point is exactly boundary frame of the non-speech segment to voice segments, when carrying out voice data processing, first to sound
Then frequency calculates energy feature, energy feature, which exceedes setting numerical value, just to be recognized according to framing is carried out to every frame data of voice data
It is voice for the frame data, is otherwise non-voice.
Here, voice data can be constantly stored in cyclic buffer with the progress of recording, and with voice data
Continuous storage, can be obtained constantly from cyclic buffer voice data carry out end-point detection, therefore, this enters to voice data
The action of row end-point detection synchronous with the action that the voice data of recording is stored in cyclic buffer can be carried out substantially, to carry
High treatment efficiency.
Step S3:Prefix word detection is carried out to the voice data lighted from speech front-end, until prefix word sound is detected,
Wherein, the prefix word performs the word of type of action for reflection, with can be by for waking up the prefix word of interactive voice with using
Organically it is combined together in the phonetic order for showing user view.The word that the reflection performs type of action is, for example, " to beat electricity
Talk about to ", " send short messages to ", " open the word that QQ ", " opening wechat " etc. meet conventional language custom.
The main function of prefix word detection is judges whether to wake up interactive voice operation, if detecting prefix word
Sound, then start speech recognition, to perform corresponding actions according to user view.
The method of prefix word detection for example may include following steps:
Step S31, acoustic feature extraction:Extract in audio-frequency information (detection of prefix word is carried out generally in units of voice segments)
It is with distinction and be the feature based on human hearing characteristic extraction, generally choose the MFCC that uses in speech recognition
(Mel-Frequency Cepstrum Coefficient, Mel frequency cepstrum coefficient) feature is as acoustic feature.
Step S32, the detection of prefix word:Obtained acoustic feature will be extracted, examined using the acoustic model of training in prefix word
Acoustic score is calculated on survey grid network, if including the prefix word to be detected in the optimal path of acoustic score, it is determined that detected
Prefix word, otherwise return to step S31 and continue to extract acoustic feature.
, can also be it is determined that in order to reduce the false drop rate of prefix word on the basis of above-mentioned steps S31 and step S32
Following steps S33 is performed after detecting prefix word.
Step S33, prefix word confirm:Obtained acoustic feature will be extracted, it is true in prefix word using the acoustic model of training
Recognize progress prefix word confirmation on network, obtain finally confirming score;Whether the prefix word for judging the detection is real prefix word,
Will the prefix word final confirmation score and thresholding set in advance be compared, if finally confirm score be more than or equal to door
Limit, then it is assumed that the prefix word is real prefix word, and voice wakes up successfully;If finally confirm that score is less than thresholding, then it is assumed that
The prefix word is false prefix word, comes back to step S31 and continues to extract acoustic feature.
Here, the word increase that the reflection for meeting conventional language custom can be performed to type of action detects network in prefix word
Confirm with prefix word in network, in addition, the method for the present invention also supports user that according to personal speech habits, reflection is performed into action
The word increase of type detects network in prefix word and prefix word confirms the operation of network.This cause the present invention method no longer by
It is limited to the fixed application convenience for waking up word, further increasing the present invention.
The implementation method of above-mentioned prefix word detection network can draw using optimal score path computing, the optimal sub-path that obtains
Calculation formula is:
Current X represents the acoustic feature vector extracted from voice data, and W represents the maximum optimal word sequence of score;Bar
Part probability P (X | W) it is acoustic model scores, it is calculated by the acoustic model trained;Prior probability P (W) is language mould
Type score, it is full probability as to the PenaltyP (X) added by different acoustic models, when acoustic model and prefix word detection net
Network is definite value after deciding.On this basis, prefix word confirms that the implementation method of network is:
A) the prefix word of detection is decoded to phoneme one-level, and records all scores:
(Scorephone1,Scorephone2,…,ScorephoneN), wherein N is phoneme number total in prefix word,
Scorephone1,Scorephone2,…,ScorephoneNThe decoding score of each phoneme in the prefix word is represented respectively.
B) each phoneme of prefix word is calculated really to recognize point, calculation is as follows:
Wherein KistartAnd KiendThe initial time of respectively i-th phoneme and end time;CMphoneiRepresent i-th of sound
Element is recognized point really, subscript phonei i-th of phoneme of expression, ScorephoneiThe decoding score of i-th of phoneme as shown above,
ScoreframekRepresent the score of kth frame obtained using prefix word confirmation network decoding.
C) the final confirmation score C M of the prefix word is calculatedword, calculation is as follows:
In order to improve prefix word detection efficiency and the degree of accuracy, the training of above-mentioned acoustic model can be divided into two parts, be respectively
Prefix word model and filler model (i.e. filler models);Prefix word model can use the acoustic model in traditional speech recognition
Training method, choose database, using based on MLE (Maximum Likelihood Estimation, maximal possibility estimation) and
Obtained under MPE (Minimum Phone Error, minimum phoneme mistake) distinction training criterion;And filler model is then used to inhale
Receive the independent voice in addition to prefix word.Therefore, prefix is carried out to the voice data lighted from speech front-end in above-mentioned steps S3
Word detection can further comprise:Based on the parallel search network for including prefix word model and filler model, detect from speech front-end
It whether there is prefix word sound in the voice data lighted.
It will be understood by those skilled in the art that the present invention can also use interactive voice field in usually use its
He detects prefix word sound at words detection means, and this embodiment of the present invention is not limited.
Step S4:Obtain in voice data from the voice segments that the front end of prefix word sound is lighted as phonetic order, until
Detect that instruction obtains termination event, to realize the combination of prefix word and phonetic order.
Here, step S1 operation continues un-interrupted after prefix word sound (waking up successfully) is detected, and
The action for obtaining phonetic order is successfully triggered by waking up, and the step is directly to be obtained after waking up successfully from cyclic buffer
Voice segments in voice data.
, can be after prefix word sound be detected for the ease of obtaining the voice segments, the aft terminal for recording prefix word sound exists
The length of storage address and prefix word sound in cyclic buffer, so, you can the forward terminal of prefix word sound is calculated
Storage address in cyclic buffer, so as to accurately obtain the language lighted in voice data from the front end of prefix word sound
Segment.
Step S5:Speech recognition is carried out to phonetic order, obtains voice identification result.
Step S6:Judge whether voice identification result is effective, the operation of corresponding voice identification result is performed if effectively;
Terminate this interactive voice if invalid, here, may remind the user that interactive failure, and remind user to input again correctly
Phonetic order.
The voice interactive method of the present invention is due to the voice segments lighted in voice data from the front end of prefix word sound being made
For phonetic order, and the word for performing type of action using reflecting is as prefix word, it is achieved that between prefix word and phonetic order
Combination, this not only it is possible to prevente effectively from occur because force cutting phonetic order bring can not obtain efficient voice identification
As a result the problem of, the efficiency of interactive voice is improved, and this word of conventional language custom that will meet is as prefix word
Mode, make the wake-up word that user is fixed without hardness memory, only need to be accustomed to saying according to conventional language needs the action performed i.e.
The execution of wake-up and the action of interactive voice can be achieved, and then further increase the property easy to use of interactive voice.
In order to improve the degree of accuracy of forward terminal detection, the detection of prefix word and speech recognition, and improve interactive voice of the present invention
The antijamming capability of method, method of the invention can also be entered before end-point detection is carried out to voice data to voice data
Row noise reduction process, clean voice data is obtained, on the other hand, above-mentioned steps S3 is specifically the clean audio number to being lighted from speech front-end
According to prefix word detection is carried out, above-mentioned steps S4 is specifically the language for obtaining and being lighted in clean voice data from the front end of prefix word sound
Segment is as phonetic order.
Judge whether institute's speech recognition result can effectively further comprise following steps in above-mentioned steps S6:
Step S61:Loading command word network.
The method of the present invention supports user to expand the operation of order word network as needed.
Step S62:Judge such as exist with the presence or absence of the order word to match with voice identification result in order word network,
Then judge that institute's speech recognition result is effective.
Here, can by calculate the similarity between voice identification result and each order word obtain voice identification result with it is each
Matching degree score between order word, if matching degree score is greater than given threshold, then it is assumed that voice identification result
Effectively, otherwise it is assumed that sound result is invalid.
Above-mentioned instruction obtains termination event and can set as needed, such as including:Voice segments terminate to have continued with voice segments
Setting time.Therefore, can be simultaneously to being lighted from the front end of prefix word sound in voice data after prefix word sound is detected
Voice segments carry out speech recognition, aft terminal detection and duration timing.Those skilled in the art can be according to practical application field
Close and the setting time be arranged to fixed value, or the setting time is arranged to be inputted by user and determined, it is generally the case that
The setting time selects in the range of 800ms to 2000ms, such as selection is 1000ms.Upper speech segment sign-off table shows detection
To the aft terminal of voice segments.If aft terminal is also not detected by when voice segments continue setting time, it also hold that voice segments
Terminate.Here, the beginning and end of each voice segments corresponds to the forward terminal and aft terminal of voice segments respectively, forward terminal is just non-language
Segment is to the boundary frame of voice segments, and aft terminal is exactly boundary frame of the voice segments to non-speech segment, and therefore, voice segments are continuous certain
The frame data of length all meet what the requirement of voice obtained.
It is corresponding with above-mentioned voice interactive method, voice interactive system of the invention as shown in Fig. 2 including recording module 1,
Endpoint detection module 2, prefix word detection module 3, Voice Activity Detection module 4, sound identification module 5, judge module 6, execution
Module 7, the recording module 1 are used for the voice data for recording user's input;The endpoint detection module 2 is used for the voice data
End-point detection is carried out, until detecting speech front-end point;The prefix word detection module 3 is used for being lighted from the speech front-end
Voice data carries out prefix word detection, until prefix word sound is detected, wherein, the prefix word performs type of action for reflection
Word;The Voice Activity Detection module 4 is used to obtain what is lighted from the front end of the prefix word sound in the voice data
Voice segments are as phonetic order, until detecting that instruction obtains termination event;The sound identification module 5 is used to refer to the voice
Order carries out speech recognition, obtains voice identification result;The judge module 6 is used to judge whether institute's speech recognition result is effective;
The execution module 7 is used to perform effective voice identification result.
The present invention system can also further comprise noise reduction module (not shown), the noise reduction module respectively with record mould
Block 1 and endpoint detection module 2 are connected, and noise reduction process is carried out for the voice data recorded to recording module 1, and by noise reduction process
Voice data afterwards sends endpoint detection module 2 to.
Further, above-mentioned prefix word detection module 3 can be additionally used in based on including prefix word model and filler model and
Row search network, detects and whether there is the prefix word sound in the voice data lighted from the speech front-end.
Further, above-mentioned judge module 6 can also be used to judge in order word network to whether there is and the speech recognition
As a result the order word to match, such as exist, then judge that institute's speech recognition result is effective.
Above-mentioned instruction, which obtains termination event, for example may include that voice segments terminate to continue setting time with voice segments, on the other hand,
Above-mentioned endpoint detection module 2 can be additionally used in the duration for the aft terminal and voice segments for detecting the voice segments.
Each embodiment in this specification is described by the way of progressive, identical similar portion between each embodiment
Divide mutually referring to what each embodiment stressed is the difference with other embodiment.It is real especially for system
For applying example, because it is substantially similar to embodiment of the method, so describing fairly simple, related part is referring to embodiment of the method
Part explanation.System embodiment described above is only schematical, wherein described be used as separating component explanation
Module or unit can be or may not be it is physically separate, can be as the part that module or unit are shown or
Person may not be physical location, you can with positioned at a place, or can also be distributed on multiple NEs.Can root
Factually border needs to select some or all of module therein realize the purpose of this embodiment scheme.Ordinary skill
Personnel are without creative efforts, you can to understand and implement.
Construction, feature and the action effect of the present invention, above institute is described in detail according to the embodiment shown in schema above
Only presently preferred embodiments of the present invention is stated, but the present invention is not to limit practical range shown in drawing, it is every according to structure of the invention
Want made change, or be revised as the equivalent embodiment of equivalent variations, when still without departing from specification and illustrating covered spirit,
All should be within the scope of the present invention.
Claims (10)
- A kind of 1. voice interactive method, it is characterised in that including:Record the voice data of user's input;End-point detection is carried out to the voice data, until detecting speech front-end point;Prefix word detection is carried out to the voice data lighted from the speech front-end, until prefix word sound is detected, wherein, institute State the word that prefix word performs type of action for reflection, and the prefix word and the phonetic order knot for showing user view It is combined;Obtain in the voice data from the voice segments that the front end of the prefix word sound is lighted as phonetic order, until detection Termination event is obtained to instruction;Speech recognition is carried out to the phonetic order, obtains voice identification result;Judge whether institute's speech recognition result is effective, the operation of corresponding institute speech recognition result is performed if effectively.
- 2. according to the method for claim 1, it is characterised in that methods described also includes:Before end-point detection is carried out to the voice data, noise reduction process is carried out to the voice data.
- 3. according to the method for claim 1, it is characterised in that the voice data to being lighted from the speech front-end enters The detection of row prefix word includes:Based on the parallel search network for including prefix word model and filler model, the audio number lighted from the speech front-end is detected It whether there is the prefix word sound in.
- 4. according to the method for claim 1, it is characterised in that described to judge whether institute's speech recognition result effectively wraps Include:Judge to whether there is the order word to match with institute speech recognition result in order word network, such as exist, then judge institute Speech recognition result is effective.
- 5. voice interactive method according to any one of claim 1 to 4, it is characterised in that the instruction, which obtains, to be terminated Event includes:Institute's speech segment terminates to continue setting time with institute speech segment.
- A kind of 6. voice interactive system, it is characterised in that including:Recording module, for recording the voice data of user's input;Endpoint detection module, for carrying out end-point detection to the voice data, until detecting speech front-end point;Prefix word detection module, for carrying out prefix word detection to the voice data lighted from the speech front-end, until detection To prefix word sound, wherein, the prefix word performs the word of type of action for reflection, and the prefix word shows with being used for The phonetic order of user view is combined together;Voice Activity Detection module, for obtaining the voice segments lighted in the voice data from the front end of the prefix word sound As phonetic order, until detecting that instruction obtains termination event;Sound identification module, for carrying out speech recognition to the phonetic order, obtain voice identification result;Judge module, for judging, whether speech recognition result is effective;AndExecution module, for performing operation corresponding to effective voice identification result.
- 7. system according to claim 6, it is characterised in that the system also includes:Noise reduction module, it is connected respectively with the recording module and the endpoint detection module, for being recorded to the recording module Voice data carry out noise reduction process, and send the voice data after noise reduction process to the endpoint detection module.
- 8. system according to claim 6, it is characterised in that the prefix word detection module is specifically used for based on before including Sew the parallel search network of word model and filler model, detect and whether there is institute in the voice data lighted from the speech front-end State prefix word sound.
- 9. system according to claim 6, it is characterised in that the judge module is specifically used for judging in order word network With the presence or absence of the order word to match with institute speech recognition result, such as exist, then judge that institute's speech recognition result is effective.
- 10. the system according to any one of claim 6 to 9, it is characterised in that the instruction, which obtains, terminates event package Include:Institute's speech segment terminates to continue setting time with institute speech segment.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410782284.4A CN104464723B (en) | 2014-12-16 | 2014-12-16 | A kind of voice interactive method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410782284.4A CN104464723B (en) | 2014-12-16 | 2014-12-16 | A kind of voice interactive method and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104464723A CN104464723A (en) | 2015-03-25 |
CN104464723B true CN104464723B (en) | 2018-03-20 |
Family
ID=52910674
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410782284.4A Active CN104464723B (en) | 2014-12-16 | 2014-12-16 | A kind of voice interactive method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104464723B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109708256A (en) * | 2018-12-06 | 2019-05-03 | 珠海格力电器股份有限公司 | Voice determination method and device, storage medium and air conditioner |
Families Citing this family (28)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106782547B (en) * | 2015-11-23 | 2020-08-07 | 芋头科技(杭州)有限公司 | Robot semantic recognition system based on voice recognition |
CN105529028B (en) * | 2015-12-09 | 2019-07-30 | 百度在线网络技术(北京)有限公司 | Speech analysis method and apparatus |
CN106887227A (en) * | 2015-12-16 | 2017-06-23 | 芋头科技(杭州)有限公司 | A kind of voice awakening method and system |
CN105869637B (en) | 2016-05-26 | 2019-10-15 | 百度在线网络技术(北京)有限公司 | Voice awakening method and device |
CN105931639B (en) * | 2016-05-31 | 2019-09-10 | 杨若冲 | A kind of voice interactive method for supporting multistage order word |
CN106157950A (en) * | 2016-09-29 | 2016-11-23 | 合肥华凌股份有限公司 | Speech control system and awakening method, Rouser and household electrical appliances, coprocessor |
CN106653013B (en) * | 2016-09-30 | 2019-12-20 | 北京奇虎科技有限公司 | Voice recognition method and device |
CN106571144A (en) * | 2016-11-08 | 2017-04-19 | 广东小天才科技有限公司 | Search method and device based on voice recognition |
CN107145329A (en) * | 2017-04-10 | 2017-09-08 | 北京猎户星空科技有限公司 | Apparatus control method, device and smart machine |
US10311874B2 (en) | 2017-09-01 | 2019-06-04 | 4Q Catalyst, LLC | Methods and systems for voice-based programming of a voice-controlled device |
CN107731226A (en) * | 2017-09-29 | 2018-02-23 | 杭州聪普智能科技有限公司 | Control method, device and electronic equipment based on speech recognition |
CN108172219B (en) * | 2017-11-14 | 2021-02-26 | 珠海格力电器股份有限公司 | Method and device for recognizing voice |
CN107886944B (en) * | 2017-11-16 | 2021-12-31 | 出门问问创新科技有限公司 | Voice recognition method, device, equipment and storage medium |
CN107919124B (en) * | 2017-12-22 | 2021-07-13 | 北京小米移动软件有限公司 | Equipment awakening method and device |
CN110299137B (en) * | 2018-03-22 | 2023-12-12 | 腾讯科技(深圳)有限公司 | Voice interaction method and device |
CN108735210A (en) * | 2018-05-08 | 2018-11-02 | 宇龙计算机通信科技(深圳)有限公司 | A kind of sound control method and terminal |
WO2019222996A1 (en) * | 2018-05-25 | 2019-11-28 | Beijing Didi Infinity Technology And Development Co., Ltd. | Systems and methods for voice recognition |
CN108922531B (en) * | 2018-07-26 | 2020-10-27 | 腾讯科技(北京)有限公司 | Slot position identification method and device, electronic equipment and storage medium |
CN109147779A (en) * | 2018-08-14 | 2019-01-04 | 苏州思必驰信息科技有限公司 | Voice data processing method and device |
JP6992713B2 (en) * | 2018-09-11 | 2022-01-13 | 日本電信電話株式会社 | Continuous utterance estimation device, continuous utterance estimation method, and program |
CN109147764A (en) * | 2018-09-20 | 2019-01-04 | 百度在线网络技术(北京)有限公司 | Voice interactive method, device, equipment and computer-readable medium |
CN108962250A (en) * | 2018-09-26 | 2018-12-07 | 出门问问信息科技有限公司 | Audio recognition method, device and electronic equipment |
CN111063356B (en) * | 2018-10-17 | 2023-05-09 | 北京京东尚科信息技术有限公司 | Electronic equipment response method and system, sound box and computer readable storage medium |
CN109887493B (en) * | 2019-03-13 | 2021-08-31 | 安徽声讯信息技术有限公司 | Character audio pushing method |
CN110930989B (en) * | 2019-11-27 | 2021-04-06 | 深圳追一科技有限公司 | Speech intention recognition method and device, computer equipment and storage medium |
CN111524512A (en) * | 2020-04-14 | 2020-08-11 | 苏州思必驰信息科技有限公司 | Method for starting one-shot voice conversation with low delay, peripheral equipment and voice interaction device with low delay response |
CN113643691A (en) * | 2021-08-16 | 2021-11-12 | 思必驰科技股份有限公司 | Far-field voice message interaction method and system |
CN113971953A (en) * | 2021-09-17 | 2022-01-25 | 珠海格力电器股份有限公司 | Voice command word recognition method and device, storage medium and electronic equipment |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102999161B (en) * | 2012-11-13 | 2016-03-02 | 科大讯飞股份有限公司 | A kind of implementation method of voice wake-up module and application |
CN103077165A (en) * | 2012-12-31 | 2013-05-01 | 威盛电子股份有限公司 | Natural language dialogue method and system thereof |
CN103220423A (en) * | 2013-04-10 | 2013-07-24 | 威盛电子股份有限公司 | Voice answering method and mobile terminal device |
CN103595869A (en) * | 2013-11-15 | 2014-02-19 | 华为终端有限公司 | Terminal voice control method and device and terminal |
CN103632667B (en) * | 2013-11-25 | 2017-08-04 | 华为技术有限公司 | acoustic model optimization method, device and voice awakening method, device and terminal |
CN103714815A (en) * | 2013-12-09 | 2014-04-09 | 何永 | Voice control method and device thereof |
-
2014
- 2014-12-16 CN CN201410782284.4A patent/CN104464723B/en active Active
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109708256A (en) * | 2018-12-06 | 2019-05-03 | 珠海格力电器股份有限公司 | Voice determination method and device, storage medium and air conditioner |
CN109708256B (en) * | 2018-12-06 | 2020-07-03 | 珠海格力电器股份有限公司 | Voice determination method and device, storage medium and air conditioner |
Also Published As
Publication number | Publication date |
---|---|
CN104464723A (en) | 2015-03-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104464723B (en) | A kind of voice interactive method and system | |
KR102134201B1 (en) | Method, apparatus, and storage medium for constructing speech decoding network in numeric speech recognition | |
CN107767863B (en) | Voice awakening method and system and intelligent terminal | |
CN107767861B (en) | Voice awakening method and system and intelligent terminal | |
CN103021409B (en) | A kind of vice activation camera system | |
CN103943105A (en) | Voice interaction method and system | |
US9286897B2 (en) | Speech recognizer with multi-directional decoding | |
US9354687B2 (en) | Methods and apparatus for unsupervised wakeup with time-correlated acoustic events | |
US9070367B1 (en) | Local speech recognition of frequent utterances | |
US9437186B1 (en) | Enhanced endpoint detection for speech recognition | |
CN103095911B (en) | Method and system for finding mobile phone through voice awakening | |
CN103544955B (en) | Identify the method and its electronic device of voice | |
CN109979474B (en) | Voice equipment and user speech rate correction method and device thereof and storage medium | |
CN104036774A (en) | Method and system for recognizing Tibetan dialects | |
US9335966B2 (en) | Methods and apparatus for unsupervised wakeup | |
CN105336324A (en) | Language identification method and device | |
Shriberg et al. | Learning When to Listen: Detecting System-Addressed Speech in Human-Human-Computer Dialog. | |
KR20170139650A (en) | Method for adding accounts, terminals, servers, and computer storage media | |
Kim et al. | Multistage data selection-based unsupervised speaker adaptation for personalized speech emotion recognition | |
CN112102850A (en) | Processing method, device and medium for emotion recognition and electronic equipment | |
US20170032778A1 (en) | Method, apparatus, and computer-readable recording medium for improving at least one semantic unit set | |
JP6915637B2 (en) | Information processing equipment, information processing methods, and programs | |
US10417345B1 (en) | Providing customer service agents with customer-personalized result of spoken language intent | |
CN105869622B (en) | Chinese hot word detection method and device | |
CN110853669B (en) | Audio identification method, device and equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |