CN106227498A

CN106227498A - A kind of voice-operated method and apparatus

Info

Publication number: CN106227498A
Application number: CN201610580086.9A
Authority: CN
Inventors: 李国辉
Original assignee: Leauto Intelligent Technology Beijing Co Ltd; LeTV Holding Beijing Co Ltd
Current assignee: Fafa Automobile (china) Co Ltd
Priority date: 2016-07-21
Filing date: 2016-07-21
Publication date: 2016-12-14

Abstract

The embodiment of the present invention provides a kind of voice-operated method and apparatus, belongs to areas of information technology.Described method includes: receives and controls voice signal；By the control voice signal of reception compared with the speech characteristic vector in sound bank, determining the suggestion voice signal corresponding with described control voice signal and control operation, wherein said suggestion voice signal is used for describing described control and operates；Suggestion voice signal determined by output；Receive the backchannel tone signal for described suggestion voice signal；Judge that described backchannel tone signal indicates whether, for the described voice confirmation controlling operation, when described backchannel tone signal represents that the voice for described control operation confirms, to perform described control and operate.Electronic equipment can be controlled by voice, solve the problem that in emergency circumstances cannot use electronic equipment under touch screen from being damaged or button failure condition by the embodiment of the present invention.

Description

A kind of voice-operated method and apparatus

Technical field

The present invention relates to areas of information technology, in particular it relates to a kind of voice-operated method and apparatus.

Background technology

Along with the development of electronic technology, occur in that various electronic equipment.User can use key-press input and touch screen point The various operations of electronic equipment are controlled by the mode hit.Present inventor is in practice, it has been found that making at electronic equipment There will be some during with and cannot carry out, by button or touch screen, the situation that operation controls.Such as, touch screen electronic equipment Middle touch screen is a kind of friable product, and in routine use, user may accidentally be broken.Additionally, certain of electronic equipment A little major function keys (such as phone transfers to button, telephone receiving button) are likely to cannot to use because of fault.Therefore, exist In the case of some major function key of electronic equipment cannot use or touch screen broken, electronic equipment will be unable to receive The control instruction of user, and then cannot be used.Now, if it occur that emergency needs to use electronic equipment, such as, make Carry out urgent call with mobile phone, but cannot be used owing to electronic equipment cannot receive control instruction.Therefore, the application Inventor finds that the middle existence of prior art causes electronic equipment to be used due to touch screen from being damaged or button fault Defect.

Summary of the invention

The purpose of the embodiment of the present invention is to provide a kind of voice-operated method and apparatus, is used for solving above-mentioned technology and asks Topic, solves above-mentioned technical problem at least in part.

To achieve these goals, the embodiment of the present invention provides a kind of voice-operated method, and the method includes: receive control Voice signal processed；By the control voice signal of reception compared with the speech characteristic vector in sound bank, determine and described control Suggestion voice signal and control that voice signal is corresponding operate, and wherein said suggestion voice signal is used for describing described control and grasps Make；Suggestion voice signal determined by output；Receive the backchannel tone signal for described suggestion voice signal；

Judge described backchannel tone signal to indicate whether for the described voice controlling operation to confirm, when described feedback voice When signal represents for the described voice confirmation controlling operation, perform described control and operate.

Preferably, described method also includes: after receiving triggering command or trigger event being detected, triggers and starts voice Control model.

Preferably, described by receive control voice signal include with the speech characteristic vector in sound bank compared with: general Control voice signal and be converted to talk spurt sequence；Speech characteristic vector is extracted from talk spurt sequence；The language that will be extracted Sound characteristic vector is compared with the speech characteristic vector in sound bank.

Preferably, described method also includes: before extracting speech characteristic vector from talk spurt sequence, from being changed Talk spurt sequence in filter interference signal.

Preferably, described the speech characteristic vector extracted is included compared with the speech characteristic vector in sound bank: The speech characteristic vector extracted is quantified as received pronunciation characteristic vector；By the received pronunciation characteristic vector quantified and voice Speech characteristic vector in storehouse compares.

Preferably, described judgement described backchannel tone signal indicates whether to confirm bag for the described voice controlling operation Include: by backchannel tone signal compared with the speech characteristic vector confirmed in voice set；When backchannel tone signal and confirmation language When at least one speech characteristic vector in sound set matches, it is determined that described backchannel tone signal represents for described control operation Voice confirm.

Preferably, described method also includes: when described backchannel tone signal does not indicates that for the described voice controlling operation During confirmation, determined by cancellation, control operation.

Another aspect according to embodiments of the present invention, it is provided that a kind of voice-operated device, this device includes: receive mould Block, is used for receiving control voice signal；Processing module, for controlling voice signal and the phonetic feature in sound bank by receive Vector compares, and determines the suggestion voice signal corresponding with described control voice signal and controls operation, wherein said prompting Voice signal is used for describing described control and operates；Output module, for suggestion voice signal determined by output；Described reception mould Block is additionally operable to receive the backchannel tone signal for described suggestion voice signal；Described processing module is additionally operable to judge described feedback Voice signal indicates whether to confirm, when described backchannel tone signal represents for described control for the described voice controlling operation When the voice of operation confirms, perform described control and operate.

Preferably, described device also includes: trigger module, after receiving triggering command or trigger event being detected, touches Send out and start Voice command pattern.

Preferably, described processing module is for being converted to talk spurt sequence by control voice signal；From talk spurt sequence Row extract speech characteristic vector；By the speech characteristic vector that extracted compared with the speech characteristic vector in sound bank.

Preferably, described processing module is additionally operable to before extracting speech characteristic vector from talk spurt sequence, from institute The talk spurt sequence of conversion filters interference signal.

Preferably, described processing module is for being quantified as received pronunciation characteristic vector by the speech characteristic vector extracted； By the received pronunciation characteristic vector that quantified compared with the speech characteristic vector in sound bank.

Preferably, described processing module is for by backchannel tone signal and the speech characteristic vector phase confirmed in voice set Relatively；When backchannel tone signal matches with at least one speech characteristic vector confirmed in voice set, it is determined that described feedback Voice signal represents for the described voice confirmation controlling operation.

Preferably, described processing module is additionally operable to when described backchannel tone signal does not indicates that for the described language controlling operation During sound confirmation, determined by cancellation, control operation.

By technique scheme, the method includes: receives and controls voice signal；By the control voice signal received and language Speech characteristic vector in sound storehouse compares, and determines the suggestion voice signal corresponding with described control voice signal and controls behaviour Make；Suggestion voice signal determined by output；Receive the backchannel tone signal for described suggestion voice signal；Judge described instead Feedback voice signal indicates whether to confirm, when described backchannel tone signal represents for described control for the described voice controlling operation When the voice of system operation confirms, perform described control and operate.So, it is possible under touch screen from being damaged or button failure condition, logical Cross voice electronic equipment is controlled, solve the problem that in emergency circumstances cannot use electronic equipment, and pass through voice Confirm that the control operation performed by ensureing needs operation to be performed consistent with user, further increase and control the accurate of operation Property.

Further feature and the advantage of the embodiment of the present invention will be described in detail in detailed description of the invention part subsequently.

Accompanying drawing explanation

Accompanying drawing is used to provide and is further appreciated by the embodiment of the present invention, and constitutes a part for description, with under The detailed description of the invention in face is used for explaining the embodiment of the present invention together, but is not intended that the restriction to the embodiment of the present invention.Attached In figure:

Fig. 1 is the flow chart of the voice-operated method of according to embodiments of the present invention；

Fig. 2 is the flow chart to the process that control voice signal processes of according to embodiments of the present invention；

Fig. 3 is the flow chart to the process that backchannel tone signal processes of according to embodiments of the present invention；

Fig. 4 is the flow chart of the voice-operated method of according to embodiments of the present invention two；

Fig. 5 is the structure chart of the voice-operated device of according to embodiments of the present invention three；And

Fig. 6 is the structure chart of the voice-operated device of according to embodiments of the present invention four.

Detailed description of the invention

Below in conjunction with accompanying drawing, the detailed description of the invention of the embodiment of the present invention is described in detail.It should be appreciated that this Detailed description of the invention described by place is merely to illustrate and explains the embodiment of the present invention, is not limited to the embodiment of the present invention.

Fig. 1 is the flow chart of the voice-operated method of according to embodiments of the present invention.The method can be used for various electronics Equipment, such as smart mobile phone and panel computer etc..As it is shown in figure 1, the method can comprise the steps in embodiment one.

In step s 110, control voice signal is received.

For example, electronic equipment can receive the control voice signal of user's input by devices such as mikes.

In embodiment one, described method may also include that after receiving triggering command or trigger event being detected, triggers Start Voice command pattern.

For example, it is possible to when the combination of specific keys or specific keys is pressed or in touch screen, special icon is clicked, Trigger and enter Voice command pattern.Further, it is also possible to using button fault or touch screen from being damaged as trigger event, when detect by When key fault or touch screen from being damaged, trigger and enter Voice command pattern.Afterwards, electronic equipment receives use under Voice command pattern The control voice signal of family input.

In technique scheme, Voice command pattern manually or automatically can be switched, in Voice command pattern Lower reception controls voice signal, it is possible to avoid by mistake non-controlling voice signal being carried out processing and being caused as controlling voice signal Unnecessary operation.

In the step s 120, by the control voice signal of reception compared with the speech characteristic vector in sound bank, determine The suggestion voice signal corresponding with controlling voice signal and control operation.

Wherein, suggestion voice signal is used for describing control operation.

In embodiment one, as in figure 2 it is shown, described, the voice signal that controls received is vowed with the phonetic feature in sound bank Amount compares and can comprise the steps.

In step S122, control voice signal is converted to talk spurt sequence.

In step S124, from talk spurt sequence, extract speech characteristic vector.

In step S126, by the speech characteristic vector that extracted compared with the speech characteristic vector in sound bank.

Further, in the embodiment of the present invention one, method may additionally include and extracts phonetic feature arrow from talk spurt sequence Before amount, from the talk spurt sequence changed, filter interference signal.

Further, described the speech characteristic vector extracted can be wrapped compared with the speech characteristic vector in sound bank Include and the speech characteristic vector extracted is quantified as received pronunciation characteristic vector, by the received pronunciation characteristic vector quantified and language Speech characteristic vector in sound storehouse compares.

For example, the processing module (such as, general processor or dedicated voice signal processor) of terminal unit can be by Control voice signal and be converted to talk spurt sequence, filter talk spurt sequence is disturbed signal.Letter after filtering Extract speech characteristic vector in number, and the speech characteristic vector extracted is converted to received pronunciation characteristic vector.By standard speech Sound characteristic vector compares with the speech characteristic vector in sound bank.Each speech characteristic vector in sound bank has correspondence Suggestion voice signal and control operation.By comparing the language determined in the sound bank matched with received pronunciation characteristic vector Sound characteristic vector, operates the suggestion voice signal of the speech characteristic vector of coupling as relative with controlling voice signal with control The suggestion voice signal answered and control operation.Such as, controlling voice signal can be " opening camera ", and corresponding suggestion voice signal can For " being currently needed for opening camera？", control operation for opening camera operation.

In technique scheme, filter the interference signal quality with raising speech characteristic vector, and then improve and voice The accuracy that speech characteristic vector in storehouse compares；The speech characteristic vector extracted is quantified as received pronunciation Characteristic Vectors Amount can further facilitate and compare with the speech characteristic vector in sound bank.

In step s 130, suggestion voice signal determined by output.

Such as, " open camera " for controlling voice signal, by the device such as loudspeaker export determined by with control voice The corresponding suggestion voice signal of signal " is currently needed for opening camera？”.User is after hearing suggestion voice signal, if institute is really The fixed operation that controls is consistent with the desired operation of user, then reply and represent the backchannel tone signal confirmed, such as " yes ", " OK ", " determination " etc..Control to operate the desired operation with user determined by if not correspond, then reply and do not indicate that the anti-of confirmation Feedback voice signal, such as "no", " please cancel " etc..

In step S140, receive the backchannel tone signal for suggestion voice signal.

Electronic equipment, after output suggestion voice signal, can receive, by devices such as mikes, the feedback voice that user replys Signal.

In step S150, it is judged that backchannel tone signal indicates whether that the voice for controlling operation confirms, works as backchannel Tone signal represents when the voice for control operation confirms, executive control operation.

In embodiment one, as it is shown on figure 3, described judgement described backchannel tone signal indicates whether for controlling operation Voice confirms to comprise the steps.

In step S152, by backchannel tone signal compared with the speech characteristic vector confirmed in voice set.

In step S154, when backchannel tone signal matches with at least one speech characteristic vector confirmed in voice set Time, it is determined that backchannel tone signal represents that the voice for controlling operation confirms.

Such as, confirm voice set can including, various expression is to controlling the speech characteristic vector that operation confirms.Will Backchannel tone signal is converted to talk spurt sequence, filters disturbing signal in talk spurt sequence.Letter after filtering Extract speech characteristic vector in number, and the speech characteristic vector extracted is converted to received pronunciation characteristic vector.By standard speech Sound characteristic vector compares with the speech characteristic vector confirmed in voice set.If backchannel tone signal and confirmation voice collection Certain speech characteristic vector in conjunction matches, it is determined that user has carried out voice confirmation to controlling operation.Now, electronic equipment Start executive control operation.

In embodiment one, described method may also include when backchannel tone signal does not indicates that the voice for controlling operation is true When recognizing, determined by cancellation, control operation.Such as, when, after user's uppick suggestion voice signal, replying such as "no", " ask Cancel " etc. backchannel tone signal.By backchannel tone signal compared with the speech characteristic vector confirmed in voice set, find anti- Feedback voice signal with confirmation voice set in speech characteristic vector do not mate, then decision-feedback voice signal do not indicate that for The voice controlling operation confirms.Now, electronic equipment decide not to perform determined by control operation, and can by this process Flow process terminates.

By technical scheme in embodiment one, it is possible under touch screen from being damaged or button failure condition, by voice to electricity Subset is controlled, and solves the problem that in emergency circumstances cannot use electronic equipment, and confirms to ensure institute by voice The control operation performed needs operation to be performed consistent with user, further increases the accuracy controlling operation.

Fig. 4 is the flow chart of the voice-operated method of according to embodiments of the present invention two.As shown in Figure 4, in embodiment two Middle the method can comprise the steps.User presses Voice command pattern and triggers button (such as volume plus-minus button), causes tactile Send instructions.In step S402, receive triggering command, trigger and start Voice command pattern.After entering Voice command pattern, use " being dialled " by mike input voice in family, the most in step s 404 under Voice command pattern, receives and control voice Signal.In step S406, the control voice signal of reception is converted to talk spurt sequence, disturbs in talk spurt sequence Signal filters, and extracts speech characteristic vector, the speech characteristic vector extracted is converted to mark the signal after filtering Quasi-speech characteristic vector.In step S408, received pronunciation characteristic vector is compared with the speech characteristic vector in sound bank Relatively, the speech characteristic vector in the sound bank matched with received pronunciation characteristic vector is determined.Each voice in sound bank Characteristic vector has the suggestion voice signal of correspondence and controls operation.Such as, with control voice signal and " dial " and match Speech characteristic vector correspondence suggestion voice signal can be " to be currently needed for carrying out dialling？", control operation for dial-up operation.? In step S410, the suggestion voice signal of the speech characteristic vector of coupling is operated as relative with controlling voice signal with control The suggestion voice signal answered and control operation.Wherein, suggestion voice signal is used for describing control operation.In step S412, defeated Suggestion voice signal determined by going out.Such as, output suggestion voice signal " is currently needed for carrying out dialling？" user hears prompting After voice signal, if controlling operation is its desired operation, then user answers " yes ".If controlling operation not for its institute Desired operation, then user answers "no".Afterwards in step S414, receive backchannel tone signal.In step S416, will Backchannel tone signal is compared with the speech characteristic vector confirmed in voice set, it is judged that backchannel tone signal whether with confirm language At least one speech characteristic vector in sound set matches, if it is, represent the backchannel tone signal voice to controlling operation Confirm, perform step S418, otherwise, represent that backchannel tone signal does not carry out voice confirmation to control operation, perform step S420.Wherein, by backchannel tone signal process compared with the speech characteristic vector confirmed in voice set with by standard speech The process that sound characteristic vector compares with the speech characteristic vector in sound bank is close, does not repeats them here.In step In S418, executive control operation.Such as, dial-up operation is carried out.In the step s 420, remove controls operation.Such as, do not carry out dialling Number operation, and return input and control voice signal stage or exit Voice command pattern.

So, input is controlled voice signal and backchannel tone signal processes, improve and speech characteristic vector The accuracy joined；By voice, electronic equipment is controlled, solves the problem that in emergency circumstances cannot use electronic equipment； And confirm that the control operation performed by guarantee needs operation to be performed consistent with user by voice, further increase control The accuracy of system operation.

Fig. 5 is the structure chart of the voice-operated device of according to embodiments of the present invention three；This device can be used for various electronics Equipment, such as smart mobile phone and panel computer etc..As it is shown in figure 5, this device can include such as lower module in embodiment three.

Receiver module 510, is used for receiving control voice signal；

Processing module 520, the control voice signal being used for receiving is compared with the speech characteristic vector in sound bank, really The fixed suggestion voice signal corresponding with described control voice signal and control operation, wherein said suggestion voice signal is used for retouching State described control to operate；

Output module 530, for suggestion voice signal determined by output；

Described receiver module 510 is additionally operable to receive the backchannel tone signal for described suggestion voice signal；

Described processing module 520 is additionally operable to judge that described backchannel tone signal indicates whether for the described language controlling operation Sound confirmation, when described backchannel tone signal represents for the described voice confirmation controlling operation, performs described control and operates.

In embodiment three, processing module 520 can be used for control voice signal is converted to talk spurt sequence；From voice Pulse train is extracted speech characteristic vector；By the speech characteristic vector that extracted compared with the speech characteristic vector in sound bank Relatively.

In embodiment three, processing module 520 can be additionally used in from talk spurt sequence extract speech characteristic vector it Before, from the talk spurt sequence changed, filter interference signal.

In embodiment three, it is special that processing module 520 can be used for that the speech characteristic vector extracted is quantified as received pronunciation Levy vector；By the received pronunciation characteristic vector that quantified compared with the speech characteristic vector in sound bank.

In embodiment three, processing module 520 can be used for backchannel tone signal special with the voice confirmed in voice set Levy vector to compare；When backchannel tone signal matches with at least one speech characteristic vector confirmed in voice set, it is determined that Described backchannel tone signal represents for the described voice confirmation controlling operation.

In embodiment three, processing module 520 can be additionally used in when described backchannel tone signal does not indicates that for described control When the voice of operation confirms, determined by cancellation, control operation.

In embodiment four, as shown in Figure 6, described device may also include that trigger module 610, when receiving triggering command Or after trigger event being detected, trigger and start Voice command pattern.

Said apparatus is corresponding with preceding method, and the illustration of its detailed description of the invention can refer in preceding method detailed Explanation does not repeats them here.

The preferred embodiment of the present invention is described in detail above in association with accompanying drawing, but, the embodiment of the present invention is not limited to Detail in above-mentioned embodiment, in the technology concept of the embodiment of the present invention, can be to the embodiment of the present invention Technical scheme carries out multiple simple variant, and these simple variant belong to the protection domain of the embodiment of the present invention.

It is further to note that each the concrete technical characteristic described in above-mentioned detailed description of the invention, at not lance In the case of shield, can be combined by any suitable means.In order to avoid unnecessary repetition, the embodiment of the present invention pair Various possible compound modes illustrate the most separately.

It will be appreciated by those skilled in the art that all or part of step realizing in above-described embodiment method can be by Program instructs relevant hardware and completes, and this program is stored in a storage medium, including some instructions with so that one Individual (can be single-chip microcomputer, chip etc.) or processor (processor) perform the whole of method described in each embodiment of the application Or part steps.And aforesaid storage medium includes: USB flash disk, portable hard drive, read only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disc or CD etc. are various can store journey The medium of sequence code.

Additionally, combination in any can also be carried out between the various different embodiment of the embodiment of the present invention, as long as it is not Running counter to the thought of the embodiment of the present invention, it should be considered as embodiment of the present invention disclosure of that equally.

Claims

1. a voice-operated method, the method includes:

Receive and control voice signal；

By the control voice signal of reception compared with the speech characteristic vector in sound bank, determine and described control voice signal Corresponding suggestion voice signal and control operation, wherein said suggestion voice signal is used for describing described control and operates；

Suggestion voice signal determined by output；

Receive the backchannel tone signal for described suggestion voice signal；

Judge described backchannel tone signal to indicate whether for the described voice controlling operation to confirm, when described backchannel tone signal When representing for the described voice confirmation controlling operation, perform described control operation.

Method the most according to claim 1, it is characterised in that described method also includes:

After receiving triggering command or trigger event being detected, trigger and start Voice command pattern.

Method the most according to claim 1, it is characterised in that described by the control voice signal received and sound bank Speech characteristic vector compares and includes:

Control voice signal is converted to talk spurt sequence；

Speech characteristic vector is extracted from talk spurt sequence；

By the speech characteristic vector that extracted compared with the speech characteristic vector in sound bank.

Method the most according to claim 3, it is characterised in that described method also includes:

Before extracting speech characteristic vector from talk spurt sequence, from the talk spurt sequence changed, filter interference letter Number.

Method the most according to claim 3, it is characterised in that described by the speech characteristic vector extracted and sound bank Speech characteristic vector compare and include:

The speech characteristic vector extracted is quantified as received pronunciation characteristic vector；

By the received pronunciation characteristic vector that quantified compared with the speech characteristic vector in sound bank.

Method the most according to claim 1, it is characterised in that described judgement described backchannel tone signal indicate whether for The described voice controlling operation confirms to include:

By backchannel tone signal compared with the speech characteristic vector confirmed in voice set；

When backchannel tone signal matches with at least one speech characteristic vector confirmed in voice set, it is determined that described backchannel Tone signal represents for the described voice confirmation controlling operation.

When described backchannel tone signal does not indicates that for the described voice confirmation controlling operation, determined by cancellation, control behaviour Make.

8. a voice-operated device, this device includes:

Receiver module, is used for receiving control voice signal；

Processing module, the control voice signal being used for receiving, compared with the speech characteristic vector in sound bank, determines and institute Stating and control the corresponding suggestion voice signal of voice signal and control operation, wherein said suggestion voice signal is used for describing described Control operation；

Output module, for suggestion voice signal determined by output；

Described receiver module is additionally operable to receive the backchannel tone signal for described suggestion voice signal；

Described processing module is additionally operable to judge described backchannel tone signal to indicate whether for the described voice controlling operation and confirms, When described backchannel tone signal represents for the described voice confirmation controlling operation, perform described control and operate.

Device the most according to claim 8, it is characterised in that described device also includes:

Trigger module, after receiving triggering command or trigger event being detected, triggers and starts Voice command pattern.

Device the most according to claim 8, it is characterised in that described processing module will be for controlling voice signal conversion For talk spurt sequence；Speech characteristic vector is extracted from talk spurt sequence；By the speech characteristic vector extracted and voice Speech characteristic vector in storehouse compares.

11. devices according to claim 10, it is characterised in that described processing module is additionally operable to from talk spurt sequence Before middle extraction speech characteristic vector, from the talk spurt sequence changed, filter interference signal.

12. devices according to claim 10, it is characterised in that described processing module is for the phonetic feature that will be extracted Vector quantization is received pronunciation characteristic vector；By the received pronunciation characteristic vector quantified and the speech characteristic vector in sound bank Compare.

13. devices according to claim 8, it is characterised in that described processing module is used for backchannel tone signal with true The speech characteristic vector recognized in voice set compares；When backchannel tone signal is special with at least one voice confirmed in voice set When levying vectors match, it is determined that described backchannel tone signal represents for the described voice confirmation controlling operation.

14. devices according to claim 8, it is characterised in that described processing module is additionally operable to when described backchannel message Number do not indicate that when confirming for the described voice controlling operation, determined by cancellation, control operation.