Summary of the invention
The purpose of the embodiment of the present invention is to provide a kind of voice-operated method and apparatus, is used for solving above-mentioned technology and asks
Topic, solves above-mentioned technical problem at least in part.
To achieve these goals, the embodiment of the present invention provides a kind of voice-operated method, and the method includes: receive control
Voice signal processed;By the control voice signal of reception compared with the speech characteristic vector in sound bank, determine and described control
Suggestion voice signal and control that voice signal is corresponding operate, and wherein said suggestion voice signal is used for describing described control and grasps
Make;Suggestion voice signal determined by output;Receive the backchannel tone signal for described suggestion voice signal;
Judge described backchannel tone signal to indicate whether for the described voice controlling operation to confirm, when described feedback voice
When signal represents for the described voice confirmation controlling operation, perform described control and operate.
Preferably, described method also includes: after receiving triggering command or trigger event being detected, triggers and starts voice
Control model.
Preferably, described by receive control voice signal include with the speech characteristic vector in sound bank compared with: general
Control voice signal and be converted to talk spurt sequence;Speech characteristic vector is extracted from talk spurt sequence;The language that will be extracted
Sound characteristic vector is compared with the speech characteristic vector in sound bank.
Preferably, described method also includes: before extracting speech characteristic vector from talk spurt sequence, from being changed
Talk spurt sequence in filter interference signal.
Preferably, described the speech characteristic vector extracted is included compared with the speech characteristic vector in sound bank:
The speech characteristic vector extracted is quantified as received pronunciation characteristic vector;By the received pronunciation characteristic vector quantified and voice
Speech characteristic vector in storehouse compares.
Preferably, described judgement described backchannel tone signal indicates whether to confirm bag for the described voice controlling operation
Include: by backchannel tone signal compared with the speech characteristic vector confirmed in voice set;When backchannel tone signal and confirmation language
When at least one speech characteristic vector in sound set matches, it is determined that described backchannel tone signal represents for described control operation
Voice confirm.
Preferably, described method also includes: when described backchannel tone signal does not indicates that for the described voice controlling operation
During confirmation, determined by cancellation, control operation.
Another aspect according to embodiments of the present invention, it is provided that a kind of voice-operated device, this device includes: receive mould
Block, is used for receiving control voice signal;Processing module, for controlling voice signal and the phonetic feature in sound bank by receive
Vector compares, and determines the suggestion voice signal corresponding with described control voice signal and controls operation, wherein said prompting
Voice signal is used for describing described control and operates;Output module, for suggestion voice signal determined by output;Described reception mould
Block is additionally operable to receive the backchannel tone signal for described suggestion voice signal;Described processing module is additionally operable to judge described feedback
Voice signal indicates whether to confirm, when described backchannel tone signal represents for described control for the described voice controlling operation
When the voice of operation confirms, perform described control and operate.
Preferably, described device also includes: trigger module, after receiving triggering command or trigger event being detected, touches
Send out and start Voice command pattern.
Preferably, described processing module is for being converted to talk spurt sequence by control voice signal;From talk spurt sequence
Row extract speech characteristic vector;By the speech characteristic vector that extracted compared with the speech characteristic vector in sound bank.
Preferably, described processing module is additionally operable to before extracting speech characteristic vector from talk spurt sequence, from institute
The talk spurt sequence of conversion filters interference signal.
Preferably, described processing module is for being quantified as received pronunciation characteristic vector by the speech characteristic vector extracted;
By the received pronunciation characteristic vector that quantified compared with the speech characteristic vector in sound bank.
Preferably, described processing module is for by backchannel tone signal and the speech characteristic vector phase confirmed in voice set
Relatively;When backchannel tone signal matches with at least one speech characteristic vector confirmed in voice set, it is determined that described feedback
Voice signal represents for the described voice confirmation controlling operation.
Preferably, described processing module is additionally operable to when described backchannel tone signal does not indicates that for the described language controlling operation
During sound confirmation, determined by cancellation, control operation.
By technique scheme, the method includes: receives and controls voice signal;By the control voice signal received and language
Speech characteristic vector in sound storehouse compares, and determines the suggestion voice signal corresponding with described control voice signal and controls behaviour
Make;Suggestion voice signal determined by output;Receive the backchannel tone signal for described suggestion voice signal;Judge described instead
Feedback voice signal indicates whether to confirm, when described backchannel tone signal represents for described control for the described voice controlling operation
When the voice of system operation confirms, perform described control and operate.So, it is possible under touch screen from being damaged or button failure condition, logical
Cross voice electronic equipment is controlled, solve the problem that in emergency circumstances cannot use electronic equipment, and pass through voice
Confirm that the control operation performed by ensureing needs operation to be performed consistent with user, further increase and control the accurate of operation
Property.
Further feature and the advantage of the embodiment of the present invention will be described in detail in detailed description of the invention part subsequently.
Detailed description of the invention
Below in conjunction with accompanying drawing, the detailed description of the invention of the embodiment of the present invention is described in detail.It should be appreciated that this
Detailed description of the invention described by place is merely to illustrate and explains the embodiment of the present invention, is not limited to the embodiment of the present invention.
Fig. 1 is the flow chart of the voice-operated method of according to embodiments of the present invention.The method can be used for various electronics
Equipment, such as smart mobile phone and panel computer etc..As it is shown in figure 1, the method can comprise the steps in embodiment one.
In step s 110, control voice signal is received.
For example, electronic equipment can receive the control voice signal of user's input by devices such as mikes.
In embodiment one, described method may also include that after receiving triggering command or trigger event being detected, triggers
Start Voice command pattern.
For example, it is possible to when the combination of specific keys or specific keys is pressed or in touch screen, special icon is clicked,
Trigger and enter Voice command pattern.Further, it is also possible to using button fault or touch screen from being damaged as trigger event, when detect by
When key fault or touch screen from being damaged, trigger and enter Voice command pattern.Afterwards, electronic equipment receives use under Voice command pattern
The control voice signal of family input.
In technique scheme, Voice command pattern manually or automatically can be switched, in Voice command pattern
Lower reception controls voice signal, it is possible to avoid by mistake non-controlling voice signal being carried out processing and being caused as controlling voice signal
Unnecessary operation.
In the step s 120, by the control voice signal of reception compared with the speech characteristic vector in sound bank, determine
The suggestion voice signal corresponding with controlling voice signal and control operation.
Wherein, suggestion voice signal is used for describing control operation.
In embodiment one, as in figure 2 it is shown, described, the voice signal that controls received is vowed with the phonetic feature in sound bank
Amount compares and can comprise the steps.
In step S122, control voice signal is converted to talk spurt sequence.
In step S124, from talk spurt sequence, extract speech characteristic vector.
In step S126, by the speech characteristic vector that extracted compared with the speech characteristic vector in sound bank.
Further, in the embodiment of the present invention one, method may additionally include and extracts phonetic feature arrow from talk spurt sequence
Before amount, from the talk spurt sequence changed, filter interference signal.
Further, described the speech characteristic vector extracted can be wrapped compared with the speech characteristic vector in sound bank
Include and the speech characteristic vector extracted is quantified as received pronunciation characteristic vector, by the received pronunciation characteristic vector quantified and language
Speech characteristic vector in sound storehouse compares.
For example, the processing module (such as, general processor or dedicated voice signal processor) of terminal unit can be by
Control voice signal and be converted to talk spurt sequence, filter talk spurt sequence is disturbed signal.Letter after filtering
Extract speech characteristic vector in number, and the speech characteristic vector extracted is converted to received pronunciation characteristic vector.By standard speech
Sound characteristic vector compares with the speech characteristic vector in sound bank.Each speech characteristic vector in sound bank has correspondence
Suggestion voice signal and control operation.By comparing the language determined in the sound bank matched with received pronunciation characteristic vector
Sound characteristic vector, operates the suggestion voice signal of the speech characteristic vector of coupling as relative with controlling voice signal with control
The suggestion voice signal answered and control operation.Such as, controlling voice signal can be " opening camera ", and corresponding suggestion voice signal can
For " being currently needed for opening camera?", control operation for opening camera operation.
In technique scheme, filter the interference signal quality with raising speech characteristic vector, and then improve and voice
The accuracy that speech characteristic vector in storehouse compares;The speech characteristic vector extracted is quantified as received pronunciation Characteristic Vectors
Amount can further facilitate and compare with the speech characteristic vector in sound bank.
In step s 130, suggestion voice signal determined by output.
Such as, " open camera " for controlling voice signal, by the device such as loudspeaker export determined by with control voice
The corresponding suggestion voice signal of signal " is currently needed for opening camera?”.User is after hearing suggestion voice signal, if institute is really
The fixed operation that controls is consistent with the desired operation of user, then reply and represent the backchannel tone signal confirmed, such as " yes ",
" OK ", " determination " etc..Control to operate the desired operation with user determined by if not correspond, then reply and do not indicate that the anti-of confirmation
Feedback voice signal, such as "no", " please cancel " etc..
In step S140, receive the backchannel tone signal for suggestion voice signal.
Electronic equipment, after output suggestion voice signal, can receive, by devices such as mikes, the feedback voice that user replys
Signal.
In step S150, it is judged that backchannel tone signal indicates whether that the voice for controlling operation confirms, works as backchannel
Tone signal represents when the voice for control operation confirms, executive control operation.
In embodiment one, as it is shown on figure 3, described judgement described backchannel tone signal indicates whether for controlling operation
Voice confirms to comprise the steps.
In step S152, by backchannel tone signal compared with the speech characteristic vector confirmed in voice set.
In step S154, when backchannel tone signal matches with at least one speech characteristic vector confirmed in voice set
Time, it is determined that backchannel tone signal represents that the voice for controlling operation confirms.
Such as, confirm voice set can including, various expression is to controlling the speech characteristic vector that operation confirms.Will
Backchannel tone signal is converted to talk spurt sequence, filters disturbing signal in talk spurt sequence.Letter after filtering
Extract speech characteristic vector in number, and the speech characteristic vector extracted is converted to received pronunciation characteristic vector.By standard speech
Sound characteristic vector compares with the speech characteristic vector confirmed in voice set.If backchannel tone signal and confirmation voice collection
Certain speech characteristic vector in conjunction matches, it is determined that user has carried out voice confirmation to controlling operation.Now, electronic equipment
Start executive control operation.
In embodiment one, described method may also include when backchannel tone signal does not indicates that the voice for controlling operation is true
When recognizing, determined by cancellation, control operation.Such as, when, after user's uppick suggestion voice signal, replying such as "no", " ask
Cancel " etc. backchannel tone signal.By backchannel tone signal compared with the speech characteristic vector confirmed in voice set, find anti-
Feedback voice signal with confirmation voice set in speech characteristic vector do not mate, then decision-feedback voice signal do not indicate that for
The voice controlling operation confirms.Now, electronic equipment decide not to perform determined by control operation, and can by this process
Flow process terminates.
By technical scheme in embodiment one, it is possible under touch screen from being damaged or button failure condition, by voice to electricity
Subset is controlled, and solves the problem that in emergency circumstances cannot use electronic equipment, and confirms to ensure institute by voice
The control operation performed needs operation to be performed consistent with user, further increases the accuracy controlling operation.
Fig. 4 is the flow chart of the voice-operated method of according to embodiments of the present invention two.As shown in Figure 4, in embodiment two
Middle the method can comprise the steps.User presses Voice command pattern and triggers button (such as volume plus-minus button), causes tactile
Send instructions.In step S402, receive triggering command, trigger and start Voice command pattern.After entering Voice command pattern, use
" being dialled " by mike input voice in family, the most in step s 404 under Voice command pattern, receives and control voice
Signal.In step S406, the control voice signal of reception is converted to talk spurt sequence, disturbs in talk spurt sequence
Signal filters, and extracts speech characteristic vector, the speech characteristic vector extracted is converted to mark the signal after filtering
Quasi-speech characteristic vector.In step S408, received pronunciation characteristic vector is compared with the speech characteristic vector in sound bank
Relatively, the speech characteristic vector in the sound bank matched with received pronunciation characteristic vector is determined.Each voice in sound bank
Characteristic vector has the suggestion voice signal of correspondence and controls operation.Such as, with control voice signal and " dial " and match
Speech characteristic vector correspondence suggestion voice signal can be " to be currently needed for carrying out dialling?", control operation for dial-up operation.?
In step S410, the suggestion voice signal of the speech characteristic vector of coupling is operated as relative with controlling voice signal with control
The suggestion voice signal answered and control operation.Wherein, suggestion voice signal is used for describing control operation.In step S412, defeated
Suggestion voice signal determined by going out.Such as, output suggestion voice signal " is currently needed for carrying out dialling?" user hears prompting
After voice signal, if controlling operation is its desired operation, then user answers " yes ".If controlling operation not for its institute
Desired operation, then user answers "no".Afterwards in step S414, receive backchannel tone signal.In step S416, will
Backchannel tone signal is compared with the speech characteristic vector confirmed in voice set, it is judged that backchannel tone signal whether with confirm language
At least one speech characteristic vector in sound set matches, if it is, represent the backchannel tone signal voice to controlling operation
Confirm, perform step S418, otherwise, represent that backchannel tone signal does not carry out voice confirmation to control operation, perform step
S420.Wherein, by backchannel tone signal process compared with the speech characteristic vector confirmed in voice set with by standard speech
The process that sound characteristic vector compares with the speech characteristic vector in sound bank is close, does not repeats them here.In step
In S418, executive control operation.Such as, dial-up operation is carried out.In the step s 420, remove controls operation.Such as, do not carry out dialling
Number operation, and return input and control voice signal stage or exit Voice command pattern.
So, input is controlled voice signal and backchannel tone signal processes, improve and speech characteristic vector
The accuracy joined;By voice, electronic equipment is controlled, solves the problem that in emergency circumstances cannot use electronic equipment;
And confirm that the control operation performed by guarantee needs operation to be performed consistent with user by voice, further increase control
The accuracy of system operation.
Fig. 5 is the structure chart of the voice-operated device of according to embodiments of the present invention three;This device can be used for various electronics
Equipment, such as smart mobile phone and panel computer etc..As it is shown in figure 5, this device can include such as lower module in embodiment three.
Receiver module 510, is used for receiving control voice signal;
Processing module 520, the control voice signal being used for receiving is compared with the speech characteristic vector in sound bank, really
The fixed suggestion voice signal corresponding with described control voice signal and control operation, wherein said suggestion voice signal is used for retouching
State described control to operate;
Output module 530, for suggestion voice signal determined by output;
Described receiver module 510 is additionally operable to receive the backchannel tone signal for described suggestion voice signal;
Described processing module 520 is additionally operable to judge that described backchannel tone signal indicates whether for the described language controlling operation
Sound confirmation, when described backchannel tone signal represents for the described voice confirmation controlling operation, performs described control and operates.
In embodiment three, processing module 520 can be used for control voice signal is converted to talk spurt sequence;From voice
Pulse train is extracted speech characteristic vector;By the speech characteristic vector that extracted compared with the speech characteristic vector in sound bank
Relatively.
In embodiment three, processing module 520 can be additionally used in from talk spurt sequence extract speech characteristic vector it
Before, from the talk spurt sequence changed, filter interference signal.
In embodiment three, it is special that processing module 520 can be used for that the speech characteristic vector extracted is quantified as received pronunciation
Levy vector;By the received pronunciation characteristic vector that quantified compared with the speech characteristic vector in sound bank.
In embodiment three, processing module 520 can be used for backchannel tone signal special with the voice confirmed in voice set
Levy vector to compare;When backchannel tone signal matches with at least one speech characteristic vector confirmed in voice set, it is determined that
Described backchannel tone signal represents for the described voice confirmation controlling operation.
In embodiment three, processing module 520 can be additionally used in when described backchannel tone signal does not indicates that for described control
When the voice of operation confirms, determined by cancellation, control operation.
In embodiment four, as shown in Figure 6, described device may also include that trigger module 610, when receiving triggering command
Or after trigger event being detected, trigger and start Voice command pattern.
Said apparatus is corresponding with preceding method, and the illustration of its detailed description of the invention can refer in preceding method detailed
Explanation does not repeats them here.
The preferred embodiment of the present invention is described in detail above in association with accompanying drawing, but, the embodiment of the present invention is not limited to
Detail in above-mentioned embodiment, in the technology concept of the embodiment of the present invention, can be to the embodiment of the present invention
Technical scheme carries out multiple simple variant, and these simple variant belong to the protection domain of the embodiment of the present invention.
It is further to note that each the concrete technical characteristic described in above-mentioned detailed description of the invention, at not lance
In the case of shield, can be combined by any suitable means.In order to avoid unnecessary repetition, the embodiment of the present invention pair
Various possible compound modes illustrate the most separately.
It will be appreciated by those skilled in the art that all or part of step realizing in above-described embodiment method can be by
Program instructs relevant hardware and completes, and this program is stored in a storage medium, including some instructions with so that one
Individual (can be single-chip microcomputer, chip etc.) or processor (processor) perform the whole of method described in each embodiment of the application
Or part steps.And aforesaid storage medium includes: USB flash disk, portable hard drive, read only memory (ROM, Read-Only
Memory), random access memory (RAM, Random Access Memory), magnetic disc or CD etc. are various can store journey
The medium of sequence code.
Additionally, combination in any can also be carried out between the various different embodiment of the embodiment of the present invention, as long as it is not
Running counter to the thought of the embodiment of the present invention, it should be considered as embodiment of the present invention disclosure of that equally.