US20010029449A1 - Apparatus and method for recognizing voice with reduced sensitivity to ambient noise - Google Patents
Apparatus and method for recognizing voice with reduced sensitivity to ambient noise Download PDFInfo
- Publication number
- US20010029449A1 US20010029449A1 US08/897,734 US89773497A US2001029449A1 US 20010029449 A1 US20010029449 A1 US 20010029449A1 US 89773497 A US89773497 A US 89773497A US 2001029449 A1 US2001029449 A1 US 2001029449A1
- Authority
- US
- United States
- Prior art keywords
- voice
- accordance
- threshold value
- ambient noise
- level
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims description 17
- 230000035945 sensitivity Effects 0.000 title 1
- 238000005070 sampling Methods 0.000 claims abstract description 30
- 230000005236 sound signal Effects 0.000 claims description 15
- 230000004044 response Effects 0.000 claims description 7
- 230000003111 delayed effect Effects 0.000 claims description 3
- 239000000872 buffer Substances 0.000 abstract description 42
- 238000005303 weighing Methods 0.000 description 4
- 230000007257 malfunction Effects 0.000 description 2
- 238000013144 data compression Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000010187 selection method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/20—Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/10—Speech classification or search using distance or distortion measures between unknown speech and reference templates
Definitions
- the present invention relates to an apparatus and method for recognizing voice. More specifically, the present invention relates to an apparatus and method for recognizing voice without no influence of ambient noise.
- a voice recognition system including two microphones is disclosed. Voice to be recognized is inputted to one of the microphones and ambient noise is inputted to the other of the microphones and a voice signal is inputted to a recognition unit to be spectrum-analyzed and an ambient noise signal is inputted to a noise measuring unit such that strength thereof is measured. When the strength of the ambient noise exceeds a predetermined value, a threshold value is subtracted from a recognition result signal from the recognition unit in a noise rejection unit.
- a principal object of the present invention is to provide a novel apparatus and method for recognizing voice.
- Another object of the present invention is to provide an apparatus and method for recognizing voice in which it is possible to further reduce influence of ambient noise.
- Another object of the present invention is to provide an apparatus and method for recognizing voice in which it is possible to correctly and surely recognize a voice even if a level of ambient noise varies.
- Another object of the present invention is to provide an apparatus and method for recognizing voice in which it is possible to register a reference pattern without influence of ambient noise.
- a voice recognizing apparatus in accordance with the present invention comprises a microphone for inputting voice sampling means for sampling a voice signal from the microphone exceeding a threshold value and changing means for changing the threshold value in accordance with a level of ambient noise.
- a voice recognizing method in accordance with the present invention comprises steps of: (a) detecting a level of ambient noise; (b) variably setting a threshold level in response to a level of detected ambient noise; and (c) detecting a boundary of a voice signal inputted from a microphone in accordance with the threshold value.
- a threshold value for sampling the voice signal is changed in accordance with a level of the ambient noise, it is possible to correctly recognize the voice inputted from the microphone without influence of the ambient noise even if the level of the ambient noise varies.
- the present invention is utilized for registration of a reference pattern, even if such a reference pattern is registered under a noisy circumstance, it is possible to prevent a reference pattern which is modified by the ambient noise from being registered. Therefore, it is possible to recognize the voice with accuracy.
- the ambient noise is generated from a loudspeaker by an audio signal from an audio equipment, and therefore, as a signal representative of the ambient noise, the audio signal which is directly inputted from the audio equipment is utilized.
- a further microphone for converting the ambient noise into an electrical signal is not required but also the ambient noise level can be surely detected.
- the ambient noise may be inputted to the further microphone as sound.
- a voice recognizing apparatus comprises: a microphone for inputting voice to be recognized; first sampling means for sampling a feature parameter of a voice signal from the microphone at every frame with a predetermined time interval; first converting means for converting the feature parameter sampled by the first sampling means into first feature parameter data; first memory means for storing the first feature parameter data outputted from the first converting means for a plurality of frames; first reading means for reading a series of first feature parameter data exceeding a threshold value from the first memory means; noise level detecting means for detecting a level of ambient noise; threshold value setting means for variably setting the threshold value in response to an amplitude of the ambient noise level; second sampling means for sampling a feature parameter of a signal representative of the ambient noise at every frame with a predetermined time interval; second converting means for converting the feature parameter sampled by the second sampling means into second feature parameter data; second memory means for storing the second feature parameter data outputted from the second converting means for a plurality of frames; second reading means for reading the second feature
- the feature parameter of the noise is eliminated from the feature parameter of the voice signal inputted from the microphone, and therefore, a feature parameter pattern for recognition or registration is not affected by the noise.
- FIG. 1 is a block diagram showing a stereo for automobile as one embodiment in accordance with the present invention.
- FIG. 2 is an illustrative view showing a memory map of a memory in FIG. 1 embodiment.
- FIGS. 3 A- 3 G are flowcharts showing an operation of FIG. 1 embodiment.
- FIG. 4 is a waveform chart showing a state where a boundary of a voice signal is sampled in FIG. 1 embodiment.
- a stereo for automobile 10 which is one embodiment in accordance with the present invention includes a microcomputer 12 by which an audio portion 14 is controlled.
- the audio portion 14 comprises a stereo sound source 16 including a tuner 18 , a tape deck 20 , CD player 22 and etc., a right signal R and a left signal L from the stereo sound source 16 are respectively applied to loudspeakers 26 R and 26 L which are arranged suitable positions in an interior of automobile (not shown) through amplifiers 20 R and 24 L.
- the stereo sound source 16 is a 4-channel stereo
- rear signals are further outputted.
- a controller 28 is further included in the audio portion 14 , and the controller 28 comprises operation switches (not shown) for manually operating the stereo sound source 16 .
- operation switches (not shown) for manually operating the stereo sound source 16 .
- a voice input switch 30 provided on the audio portion 14 is operated.
- control signals from the microcomputer 12 are inputted to the stereo sound signals generating apparatus 16 .
- a microphone 32 for picking-up voice of a driver for controlling the audio portion 14 is arranged on a dashboard (not shown) of the automobile.
- a voice signal from the microphone 32 is given to a filter bank 34 .
- the filter bank 34 includes bandpass filters of 8 channels, and therefore, feature parameters of the voice signal inputted from the microphone 32 is extracted by the bandpass filters.
- the filter bank 34 comprises a preamplifier, automatic gain control, bandpass filter, rectifying circuit and a lowpass filter for each channel.
- Respective feature parameters (analog signals) from the filter bank 34 are inputted to a multiplexer 36 .
- the multiplexer 36 time-sequentially outputs the feature parameters of 8 channels inputted from the filter bank 34 .
- the voice signal outputted from the multiplexer 36 are converted into feature parameter data by an A/D converter 38 .
- the right signal R and the left signal L (and rear signals, if any) from the stereo sound source 16 included in the audio portion 14 are added to each other by an adder 40 , and a signal from the adder 40 is applied to a terminal 42 as an electrical signal representative of ambient noise.
- a sound signal is directly applied to the terminal 42 from the audio portion 14 .
- the stereo sound signals from the audio portion 14 are generated as sound from the loudspeakers 26 R and 26 L and thus the sound are inputted to the microphone 32 as the ambient noise, in this embodiment shown, by directly inputting the sound signal to the terminal 42 from the audio portion 14 , the sound generated from the audio portion 14 are regarded and handled as the ambient noise.
- the sound signal (noise signal) inputted to the above described terminal 42 is applied to a filter bank 46 having structure similar to that of the above described filter bank 34 through an attenuator 44 .
- Feature parameters (analog signals) of respective frequency bands from the filter bank 46 are inputted to a multiplexer 48 .
- the multiplexer 48 further receives a noise signal from the attenuator 44 as it is, and time-sequentially outputs the feature parameters of 8 channels inputted from the filter bank 46 or a whole noise signal from the attenuator 44 .
- the feature parameters of the noise and the whole noise signal outputted from the multiplexer 48 are converted into digital data by an A/D converter 50 .
- the noise signal from the terminal 42 is sampled and inputted as the feature parameter data.
- a signal from the above described voice input switch 30 and outputs of the A/D converters 38 and 50 are inputted to the above described microcomputer 12 through an input port 52 .
- the microcomputer 12 recognizes the voice inputted from the microphone 32 by comparing the parameters inputted from the input port 52 with respective reference patterns in a reference pattern table formed in the memory 54 as described later. Then, in accordance with a recognition result, the microcomputer 12 outputs the afore mentioned control signals to the audio portion 14 through an output port 56 .
- the control signal is outputted from the microcomputer 12 .
- the controller 28 controls the stereo sound source 16 .
- the memory 54 includes, as shown in FIG. 2, a reference pattern table 54 a in which the reference patterns of feature parameters of respective pronunciations or words for recognizing the voice based upon the feature parameter sampled by the filter bank 34 are set in advance.
- the reference pattern table 54 a is constructed by a backed-up RAM, for example.
- a power data pattern table 54 b and a threshold value table 54c are further assigned.
- power data patterns of 9 sets in total are set in advance in accordance with 8 noise levels 1-8, and in corresponding to the power data patterns set in the power data pattern table 54 b , threshold value data for respective noise levels are set in the threshold value table 54 c.
- the threshold data set in the threshold value table 54 c are data of one and a half times, for example, the data set in the power data pattern table 54 b .
- the sound power data is calculated as a weighted mean value in a learning mode, if such power data is used as the threshold value data as it is, large noise inputted from the microphone 32 is sampled as voice therefore it must prevent such malfunction as possible.
- the pattern table 54 b and the threshold value table 54 c may be constructed by a backed-up RAM or a ROM.
- the memory 54 further includes a noise level buffer 54 d, voice parameter buffer 54 e, noise parameter buffer 54 f and a threshold value buffer 54 g.
- Each of the buffers 54 d - 54 g has a plurality of addresses so that a series of data for a plurality of frames can be stored. In addition, one frame is set as 5 milliseconds, for example.
- the noise level buffer 54 d stores frame by frame data representative of levels of the ambient noise which are applied from the attenuator 44 and the multiplexer 48 and converted into digital data by the A/D converter 50 .
- the voice parameter buffer 54 e stores frame by frame the feature parameter data of the voice inputted from the microphone 32 which are outputted from the A/D converter 38 .
- the noise parameter buffer 54 f stores frame by frame the feature parameter data of the noise signal inputted to the terminal 42 which are outputted from the A/D converter 50 .
- the threshold value data buffer 54 g stores frame by frame the threshold value data for sampling which are variably set as described later.
- the memory 54 further includes a power data buffer 54 h having addresses corresponding to respective noise levels and a power data buffer 54 i having only one address.
- the power data buffer 54 h is used in determining that the pattern of the voice power data is most similar to any one of the patterns of the power data pattern table 54 b to decide a threshold value in the learning mode described later.
- the power data buffer 54 i is utilized in determining a threshold value in the recognition mode or registration mode described later when the noise level is small.
- the memory 54 includes a head address register 54 j and a tail address register 54 k.
- the head address register 54 j data representative of an address of the voice parameter buffer 54 e which stores a head of a series of feature parameter data exceeding the threshold value is stored.
- the tail address register 54 k data representative of an address of the voice parameter buffer 54 e which stores a tail of the series of feature parameter data exceeding the threshold value is stored.
- a first step S 1 of FIG. 3A the microcomputer 12 determines on the basis of a signal from the input port 52 whether or not the voice input switch 30 of the audio portion is turned-on.
- the learning mode which is a mode other than the recognition mode wherein the audio portion 14 is controlled by a voice input to the microphone 32 or the registration mode for registering a voice input from the microphone 32 is set.
- the learning mode is a mode for preliminarily setting a threshold value for sampling the voice signal prior to the recognition mode or registration mode.
- step S 2 data representative of a level of a whole noise signal which is inputted to the A/D converter 50 not through the filter bank 46 and converted into digital data therein is written in the noise level data buffer 54 d .
- step S 3 power data is read from an address of the power data buffer 54 h in corresponding to the noise level data.
- the power data can be evaluated by summing the feature parameters stored in the voice parameter buffer 54 e .
- step S 4 a weighted mean value of read power data and power data being inputted currently is calculated.
- the new weighted mean value P n+1 thus evaluated is re-stored in an address of the power data buffer 54 h corresponding to the noise level at that time.
- the power data buffer is renewed at every timing when the noise level data is inputted.
- a next step S 5 the microcomputer 12 determines again whether or not the voice input switch 30 is turned-on. In a case where the voice input switch 30 is not turned-on, the above described steps S 2 to S 4 are repeatedly executed.
- a step S 6 the pattern of the power data which is calculated in the previous step S 4 and stored is read from the power data buffer 54 h.
- the power data pattern which is most similar to the power data pattern as read is selected from the power data pattern table 54 b .
- a selection method is as follows: a current power data total sum is subtracted from each of the power data and 9 total sums of the power data are evaluated by summing the power data for each of 9 sets of the power data which are set in the previous power data pattern table 54 b, and the 9 sets of the power data patterns by which a result numeral value becomes smallest are selected.
- a threshold value corresponding to a noise level inputted in the step S 2 is determined. More specifically, a threshold value pattern corresponding to the power data pattern selected in the step S 7 is selected from the threshold value table 54 c , and a threshold value corresponding to the noise level at that time is selected from the threshold values of respective noise levels included in a selected threshold value pattern and a selected threshold value is preliminarily set as a threshold value for the recognition mode or registration mode. That is, in the learning mode, in accordance with the pattern of the power of the voice inputted from the microphone 32 , the threshold value is variably set in accordance with an amplitude of the noise level. Thus, the learning mode is executed.
- the microcomputer 12 determines whether or not the registration mode is set in a step S 9 . Then, if not the registration mode, the recognition mode will be executed.
- a first step S 10 of the recognition mode as shown in FIG. 3B the microcomputer 12 starts sampling of the voice input from the microphone 32 and the noise level from the terminal 42 . Then, in a step S 11 , sampled noise level data, sampled voice parameter data and sampled noise parameter data are respectively stored in the noise level data buffer 54 d, voice parameter buffer 54 e and the noise parameter buffer 54 f . In a step S 12 , the microcomputer 12 determines whether or not the noise level inputted at that time is small.
- the microcomputer 12 reads the noise level data of the past frames (for example, 10 frames) from the noise level buffer 54 d . Then, in a step S 14 , by calculating a weighted mean value of the noise levels, a threshold value is determined. That is, on the basis of consideration similar to that of the previous equation, a weighted mean value of the noise levels is calculated, and in accordance with a noise level thus obtained, a threshold value corresponding to the noise level is read from the threshold value table 54 c. . However, in order to set a threshold value which is more matched to a situation of the current noise, the weight of the current noise level is made heavier than that of the past noise levels.
- step S 15 the microcomputer 12 determined whether or not the noise level currently increases by comparing the data of the current noise level with the data of just before frame which is stored in the noise level buffer 54 b.
- a step S 16 the microcomputer 12 calculates a weighted mean value of threshold values of the past frame being stored in the threshold value buffer 54 f and the threshold value of the current frame evaluated in the previous step S 14 .
- a weighing coefficient of the threshold value of the current frame for example, “1.0”
- a weighing coefficient of the past threshold value for example, “0.5”
- a step S 17 the microcomputer 12 calculates a weighted mean value by the threshold values of the past frames and the threshold value of the current frame.
- a weighing coefficient of the threshold value of the current frame for example, “0.5”
- a weighing coefficient of the past threshold value for example, “1.0”
- YES is determined in the previous step S 12 , that is, it is determined that the noise level is small
- the microcomputer 12 sets a threshold value on the basis of the power data of the past frames. More specifically, a simple mean value of the voice power of the past frames is calculated by the microcomputer 12 and stored in the power data 54 i (FIG. 2), but a power approximately one and a half times the power stored in the power data buffer 54 i is set as a threshold value for sampling.
- a threshold value for sampling the voice to be recognized is determined in accordance with an amplitude of the noise level inputted to the terminal 42 from the audio portion 14 .
- the threshold value thus set is stored in the threshold value buffer 54 g.
- the microcomputer 12 determines whether or not the feature parameter of the voice being stored in the voice parameter buffer 54 e exceeds the threshold value set as described above. If the feature parameter of a given frame exceeds the threshold value, since the frame is a head frame of the voice or word to be recognized, in a next step S 20 , the microcomputer 12 loads an address of the voice parameter buffer 54 e in which the feature parameter of that frame is stored to the head address register 54 j as the head address. In FIG. 4 example, an address of the frame Fh′ becomes the head address.
- the microcomputer 12 determines whether or not the feature parameter being stored in the voice parameter buffer 54 e becomes below the threshold value. If the feature parameter of a given frame becomes below the threshold value, since the frame is a tail frame of the voice or word to be recognized, in a next step S 22 , the microcomputer 12 loads an address of the voice parameter buffer 54 e in which the voice feature parameter data of that frame is stored to the tail address register 54 k as the tail address. In FIG. 4 example, an address of the frame Ft′ becomes the tail address.
- the feature parameters of the succeeding frames from Fh′ to Ft′ are provisionally sampled as the feature parameters of the voice to be recognized.
- next steps S 23 and S 24 the microcomputer 12 seeks a true head address and a true tail address, respectively because the threshold value previously set is determined in accordance with an amplitude of the noise level; however, if the noise level is large, the threshold value is also large, it is apprehended that a head and a tail of the voice cannot be sampled correctly. Then, in the step S 23 , by searching a frame in which the voice power was minimum out of the frames before the frame indicated by the head address determined in the step S 20 , a true head of the voice to be recognized is sought. In FIG. 4 example, an address of the frame Fh becomes the true head address.
- step S 24 by searching a frame in which the voice power was minimum out of the frames after the frame indicated by the tail address determined in the step S 20 , a true tail of the voice to be recognized is sought.
- an address of the frame Ft becomes the true tail address.
- a step S 25 the microcomputer 12 determines whether or not a time period from the frame indicated by the true head address to the frame indicated by the true tail address is within a proper length, for example, 0.3-1.5 seconds. A value of this time period is set experimentally, and thus, the same may be changed suitably. If the time period is not proper, the process returns to the previous step S 10 (FIG. 3B) without execution of the following recognition operation.
- a next step S 26 the noise level data between the frames respectively corresponding to the true head address and the true tail address are read from the noise level buffer 54 d, and a simple mean value of the noise level data is calculated. Then, in a step S 27 , it is determined whether or not the average noise level is below a predetermined value. The reason is that when the average noise level is large, the threshold value is also large, and thus, there is possibility that the voice has not been correctly sampled, and in such a case, in order to prevent malfunction, it is required that the sampling of the voice data is made invalid so as not to recognized voice. Therefore, in a case where “NO” is determined in the step S 27 , the process returns to the previous step S 10 (FIG. 3B) with no operation.
- the microcomputer 12 reads the voice parameter data from the voice parameter buffer 54 e in a succeeding step S 28 and reads the noise parameter data from the noise parameter buffer 54 e in a step S 29 . Then, in a step S 30 , the noise parameter data is subtracted from the voice parameter data, and result data is re-stored in the voice parameter buffer 54 a . Thus, only the feature parameters of the voice inputted to the microphone 32 can be stored in the voice parameter buffer 54 e . Then, operations from the step S 27 to the step S 30 are repeatedly executed for each frame until the tail frame is detected in a step S 31 .
- the sound signal from the audio portion 14 is directly inputted to the terminal 42 as the noise signal.
- a time when the sound from the audio portion 14 becomes the ambient noise with respect to the microphone 32 is a time when the sound actually generated from the loudspeakers 26 R and 26 L in response to the sound signal. Therefore, a time delay approximately 30 milliseconds, for example occurs from at the time when the sound signal is inputted to the terminal 42 to the time when the sound signal is inputted to the microphone 32 as the ambient noise. Therefore, if the noise parameters of the same frames as that of the voice parameters are subtracted from the voice parameters, the both becomes not coincident in time with each other due to the above described time difference. Therefore, in this embodiment shown, in a step S 29 , the noise parameter data which are delayed are read. That is, the noise parameter data which are delayed from the frames of the voice parameter buffer 54 e by approximately 6 frames.
- a delay circuit may be inserted between the terminal 42 and the filter bank 46 .
- an amplitude of the noise signal directly inputted to the terminal 42 is larger than an amplitude of the noise which is generated from the loudspeakers 26 R and 26 L and then inputted to the microphone 32 . Therefore, in this embodiment shown, by taking a difference of the levels into consideration, in a step S 30 , the noise parameters multiplied by ⁇ ( ⁇ is a constant below 1) are subtracted from the voice parameters.
- the voice data is compressed in a step S 32 , and thereafter, a recognition operation is executed in a step S 33 . That is, it is determined that the produced voice parameter pattern is most similar to any one of a number of reference patterns being set in advance in the reference pattern table 54 a . Then, a similarity S which is an error exceeds a predetermined value, the voice having the feature parameters is finally recognized.
- a threshold value of the similarity S may be changed in accordance with the noise level. More specifically, when the noise level is large, the threshold value of the similarity S which becomes a threshold value of the recognition is set to be small and, when the noise level is small, the threshold value of the similarity S is set to be large.
- Step S 9 in FIG. 3A the microcomputer 12 executes respective steps of the registration mode shown in FIG. 3D-FIG. 3G.
- Steps S 40 -S 62 of the registration mode are wholly the same as the operations of the steps S 10 -S 32 in the previous recognition mode, and therefore, a duplicate description will be omitted here.
- the voice parameter pattern which is data-compressed in the step S 62 is stored in the reference pattern table 54 a (FIG. 2) described previously.
- a threshold value for sampling the voice parameter data is variably set in accordance with an amplitude of the noise level.
- the microcomputer 12 controls the audio portion 14 by recognizing the voice from the microphone 32 .
- the present invention is not limited to the automobile stereo of the embodiment and may be arbitrarily applied to a radio, a television set and a broadcasting equipments of a background music in an office.
- the power data table 54 b and the threshold value table 54 c are used; however, the threshold value may be changed in response to the noise level through calculation for each frame.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Circuit For Audible Band Transducer (AREA)
Abstract
A voice recognizing apparatus includes a filter bank for deriving feature parameters of voice from a microphone and a filter bank for deriving feature parameters of noise represented by an electric signal which is directly inputted to a terminal. The voice parameters and the noise parameters are respectively stored in a voice parameter buffer and a noise parameter buffer. A threshold value for sampling the voice parameters is set to be changed in accordance with a noise level for each frame. From a series of the voice parameters exceeding the threshold value, a series of the noise parameters corresponding thereto are subtracted so that only a voice parameter pattern can be obtained. In a recognition mode, the voice pattern is compared with a reference pattern to recognize the voice from the microphone. In addition, in a registration mode, the voice pattern is registered as a reference pattern.
Description
- 1. Field of the Invention
- The present invention relates to an apparatus and method for recognizing voice. More specifically, the present invention relates to an apparatus and method for recognizing voice without no influence of ambient noise.
- 2. Description of the Prior Art
- In a voice recognition apparatus, since voice to be recognized as well as ambient noise are inputted to a microphone, it is an important subject to correctly recognize the voice without influence of the ambient noise.
- In U.S. Pat. No. 4,239,936 issued on Dec. 16, 1980, for example, a voice recognition system including two microphones is disclosed. Voice to be recognized is inputted to one of the microphones and ambient noise is inputted to the other of the microphones and a voice signal is inputted to a recognition unit to be spectrum-analyzed and an ambient noise signal is inputted to a noise measuring unit such that strength thereof is measured. When the strength of the ambient noise exceeds a predetermined value, a threshold value is subtracted from a recognition result signal from the recognition unit in a noise rejection unit.
- In the above described prior art, it is still impossible to implement noise rejection sufficient for correctly recognizing because it is impossible to reject only the noise signal even if the above described threshold value is used since the two microphones respectively receive the voice to be recognized and the ambient noise. In addition, since the rejection standard is a constant level while the strength of the ambient noise varies, when the strength of the ambient noise is changed, the ambient noise cannot be sufficiently rejected.
- Therefore, a principal object of the present invention is to provide a novel apparatus and method for recognizing voice.
- Another object of the present invention is to provide an apparatus and method for recognizing voice in which it is possible to further reduce influence of ambient noise.
- Another object of the present invention is to provide an apparatus and method for recognizing voice in which it is possible to correctly and surely recognize a voice even if a level of ambient noise varies.
- Another object of the present invention is to provide an apparatus and method for recognizing voice in which it is possible to register a reference pattern without influence of ambient noise.
- A voice recognizing apparatus in accordance with the present invention comprises a microphone for inputting voice sampling means for sampling a voice signal from the microphone exceeding a threshold value and changing means for changing the threshold value in accordance with a level of ambient noise.
- A voice recognizing method in accordance with the present invention comprises steps of: (a) detecting a level of ambient noise; (b) variably setting a threshold level in response to a level of detected ambient noise; and (c) detecting a boundary of a voice signal inputted from a microphone in accordance with the threshold value.
- In accordance with the present invention, since a threshold value for sampling the voice signal is changed in accordance with a level of the ambient noise, it is possible to correctly recognize the voice inputted from the microphone without influence of the ambient noise even if the level of the ambient noise varies. In addition, if the present invention is utilized for registration of a reference pattern, even if such a reference pattern is registered under a noisy circumstance, it is possible to prevent a reference pattern which is modified by the ambient noise from being registered. Therefore, it is possible to recognize the voice with accuracy.
- In one embodiment, after the voice signal from the microphone is sampled in accordance with the threshold value which is determined in accordance with an amplitude of the ambient noise level, a true head and a true tail of the voice to be recognized are detected. Therefore, in accordance with this embodiment, recognition accuracy can be further increased.
- In another embodiment, the ambient noise is generated from a loudspeaker by an audio signal from an audio equipment, and therefore, as a signal representative of the ambient noise, the audio signal which is directly inputted from the audio equipment is utilized. In accordance with this embodiment, not only a further microphone for converting the ambient noise into an electrical signal is not required but also the ambient noise level can be surely detected. However, the ambient noise may be inputted to the further microphone as sound.
- In another embodiment, a voice recognizing apparatus comprises: a microphone for inputting voice to be recognized; first sampling means for sampling a feature parameter of a voice signal from the microphone at every frame with a predetermined time interval; first converting means for converting the feature parameter sampled by the first sampling means into first feature parameter data; first memory means for storing the first feature parameter data outputted from the first converting means for a plurality of frames; first reading means for reading a series of first feature parameter data exceeding a threshold value from the first memory means; noise level detecting means for detecting a level of ambient noise; threshold value setting means for variably setting the threshold value in response to an amplitude of the ambient noise level; second sampling means for sampling a feature parameter of a signal representative of the ambient noise at every frame with a predetermined time interval; second converting means for converting the feature parameter sampled by the second sampling means into second feature parameter data; second memory means for storing the second feature parameter data outputted from the second converting means for a plurality of frames; second reading means for reading the second feature parameter data from addresses of the second memory means corresponding to addresses of the first memory means from which the first reading means reads the first feature parameter data; and subtracting means for subtracting the second feature parameter data from the first feature parameter data.
- In accordance with this embodiment, the feature parameter of the noise is eliminated from the feature parameter of the voice signal inputted from the microphone, and therefore, a feature parameter pattern for recognition or registration is not affected by the noise.
- The objects and other objects, features, aspects and advantages of the present invention will become more apparent from the following detailed description of the embodiments of the present invention when taken in conjunction with accompanying drawings.
- FIG. 1 is a block diagram showing a stereo for automobile as one embodiment in accordance with the present invention.
- FIG. 2 is an illustrative view showing a memory map of a memory in FIG. 1 embodiment.
- FIGS.3A-3G are flowcharts showing an operation of FIG. 1 embodiment.
- FIG. 4 is a waveform chart showing a state where a boundary of a voice signal is sampled in FIG. 1 embodiment.
- In referring FIG. 1, a stereo for
automobile 10 which is one embodiment in accordance with the present invention includes amicrocomputer 12 by which anaudio portion 14 is controlled. Theaudio portion 14 comprises astereo sound source 16 including atuner 18, atape deck 20,CD player 22 and etc., a right signal R and a left signal L from thestereo sound source 16 are respectively applied toloudspeakers amplifiers 20R and 24L. In a case where thestereo sound source 16 is a 4-channel stereo, rear signals are further outputted. - A
controller 28 is further included in theaudio portion 14, and thecontroller 28 comprises operation switches (not shown) for manually operating thestereo sound source 16. However, in a case where theaudio portion 14 and thus thestereo sound source 16 is controlled by control signals from themicrocomputer 12, avoice input switch 30 provided on theaudio portion 14 is operated. In this case, in addition to operation signals from the above described operation switches, control signals from themicrocomputer 12 are inputted to the stereo soundsignals generating apparatus 16. - On the other hand, on a dashboard (not shown) of the automobile, a
microphone 32 for picking-up voice of a driver for controlling theaudio portion 14 is arranged. A voice signal from themicrophone 32 is given to afilter bank 34. As well known, thefilter bank 34 includes bandpass filters of 8 channels, and therefore, feature parameters of the voice signal inputted from themicrophone 32 is extracted by the bandpass filters. More specifically, thefilter bank 34 comprises a preamplifier, automatic gain control, bandpass filter, rectifying circuit and a lowpass filter for each channel. Respective feature parameters (analog signals) from thefilter bank 34 are inputted to amultiplexer 36. Themultiplexer 36 time-sequentially outputs the feature parameters of 8 channels inputted from thefilter bank 34. Then, the voice signal outputted from themultiplexer 36 are converted into feature parameter data by an A/D converter 38. - Furthermore, the right signal R and the left signal L (and rear signals, if any) from the
stereo sound source 16 included in theaudio portion 14 are added to each other by anadder 40, and a signal from theadder 40 is applied to aterminal 42 as an electrical signal representative of ambient noise. Thus, a sound signal is directly applied to theterminal 42 from theaudio portion 14. Although as described above, the stereo sound signals from theaudio portion 14 are generated as sound from theloudspeakers microphone 32 as the ambient noise, in this embodiment shown, by directly inputting the sound signal to theterminal 42 from theaudio portion 14, the sound generated from theaudio portion 14 are regarded and handled as the ambient noise. - Then, the sound signal (noise signal) inputted to the above described
terminal 42 is applied to afilter bank 46 having structure similar to that of the above describedfilter bank 34 through anattenuator 44. Feature parameters (analog signals) of respective frequency bands from thefilter bank 46 are inputted to amultiplexer 48. Themultiplexer 48 further receives a noise signal from theattenuator 44 as it is, and time-sequentially outputs the feature parameters of 8 channels inputted from thefilter bank 46 or a whole noise signal from theattenuator 44. The feature parameters of the noise and the whole noise signal outputted from themultiplexer 48 are converted into digital data by an A/D converter 50. Thus, as similar to the voice signal from themicrophone 32, the noise signal from theterminal 42 is sampled and inputted as the feature parameter data. - A signal from the above described
voice input switch 30 and outputs of the A/D converters microcomputer 12 through aninput port 52. Themicrocomputer 12 recognizes the voice inputted from themicrophone 32 by comparing the parameters inputted from theinput port 52 with respective reference patterns in a reference pattern table formed in thememory 54 as described later. Then, in accordance with a recognition result, themicrocomputer 12 outputs the afore mentioned control signals to theaudio portion 14 through anoutput port 56. - Therefore, if the voice for controlling the
audio portion 14 is inputted to themicrophone 32 when thevoice inputs switch 30 is operated, in accordance with the voice, the control signal is outputted from themicrocomputer 12. In response to the control signal, thecontroller 28 controls thestereo sound source 16. - The
memory 54 includes, as shown in FIG. 2, a reference pattern table 54 a in which the reference patterns of feature parameters of respective pronunciations or words for recognizing the voice based upon the feature parameter sampled by thefilter bank 34 are set in advance. In addition, the reference pattern table 54 a is constructed by a backed-up RAM, for example. - In the
memory 54, a power data pattern table 54 b and a threshold value table 54c are further assigned. In the power data pattern table 54 b, power data patterns of 9 sets in total are set in advance in accordance with 8 noise levels 1-8, and in corresponding to the power data patterns set in the power data pattern table 54 b, threshold value data for respective noise levels are set in the threshold value table 54 c. The threshold data set in the threshold value table 54 c are data of one and a half times, for example, the data set in the power data pattern table 54 b. The reason is as follows: Since the sound power data is calculated as a weighted mean value in a learning mode, if such power data is used as the threshold value data as it is, large noise inputted from themicrophone 32 is sampled as voice therefore it must prevent such malfunction as possible. In addition, the pattern table 54 b and the threshold value table 54 c may be constructed by a backed-up RAM or a ROM. - The
memory 54 further includes anoise level buffer 54 d,voice parameter buffer 54 e,noise parameter buffer 54 f and a threshold value buffer 54 g. Each of thebuffers 54 d-54 g has a plurality of addresses so that a series of data for a plurality of frames can be stored. In addition, one frame is set as 5 milliseconds, for example. Thenoise level buffer 54 d stores frame by frame data representative of levels of the ambient noise which are applied from theattenuator 44 and themultiplexer 48 and converted into digital data by the A/D converter 50. Thevoice parameter buffer 54 e stores frame by frame the feature parameter data of the voice inputted from themicrophone 32 which are outputted from the A/D converter 38. Thenoise parameter buffer 54 f stores frame by frame the feature parameter data of the noise signal inputted to the terminal 42 which are outputted from the A/D converter 50. The threshold value data buffer 54 g stores frame by frame the threshold value data for sampling which are variably set as described later. - The
memory 54 further includes apower data buffer 54 h having addresses corresponding to respective noise levels and a power data buffer 54 i having only one address. Thepower data buffer 54 h is used in determining that the pattern of the voice power data is most similar to any one of the patterns of the power data pattern table 54 b to decide a threshold value in the learning mode described later. The power data buffer 54 i is utilized in determining a threshold value in the recognition mode or registration mode described later when the noise level is small. - In addition, the
memory 54 includes ahead address register 54 j and atail address register 54 k. In thehead address register 54 j, data representative of an address of thevoice parameter buffer 54 e which stores a head of a series of feature parameter data exceeding the threshold value is stored. In thetail address register 54 k, data representative of an address of thevoice parameter buffer 54 e which stores a tail of the series of feature parameter data exceeding the threshold value is stored. - Next, with reference to FIGS.3A-3G, an operation of the embodiment shown in FIGS. 1 and 2 will be described.
- In a first step S1 of FIG. 3A, the
microcomputer 12 determines on the basis of a signal from theinput port 52 whether or not thevoice input switch 30 of the audio portion is turned-on. When thevoice input switch 30 is not turned-on, the learning mode which is a mode other than the recognition mode wherein theaudio portion 14 is controlled by a voice input to themicrophone 32 or the registration mode for registering a voice input from themicrophone 32 is set. The learning mode is a mode for preliminarily setting a threshold value for sampling the voice signal prior to the recognition mode or registration mode. - Therefore, if “NO” is determined in the step S1, the process proceeds to a step S2. In the step S2, data representative of a level of a whole noise signal which is inputted to the A/
D converter 50 not through thefilter bank 46 and converted into digital data therein is written in the noiselevel data buffer 54 d. In a next step S3, power data is read from an address of thepower data buffer 54 h in corresponding to the noise level data. In addition, the power data can be evaluated by summing the feature parameters stored in thevoice parameter buffer 54 e. Then, in a step S4, a weighted mean value of read power data and power data being inputted currently is calculated. Assuming that the read power data from thepower data buffer 54 h, which is also a result of the calculation of a weighted mean value, is Pn, and the number of times that the noise levels which result in Pn are inputted, and a current power is N, the following equation is used for calculating a new weighted mean value Pn+1; - The new weighted mean value Pn+1 thus evaluated is re-stored in an address of the
power data buffer 54 h corresponding to the noise level at that time. Thus, in the learning mode, the power data buffer is renewed at every timing when the noise level data is inputted. - In a next step S5, the
microcomputer 12 determines again whether or not thevoice input switch 30 is turned-on. In a case where thevoice input switch 30 is not turned-on, the above described steps S2 to S4 are repeatedly executed. - When the
voice input switch 30 is turned-on, in a step S6, the pattern of the power data which is calculated in the previous step S4 and stored is read from thepower data buffer 54 h. Succeedingly, in a step S7, the power data pattern which is most similar to the power data pattern as read is selected from the power data pattern table 54 b. A selection method is as follows: a current power data total sum is subtracted from each of the power data and 9 total sums of the power data are evaluated by summing the power data for each of 9 sets of the power data which are set in the previous power data pattern table 54 b, and the 9 sets of the power data patterns by which a result numeral value becomes smallest are selected. - Then, in a step S8, with reference to the threshold value table 54 c, a threshold value corresponding to a noise level inputted in the step S2 is determined. More specifically, a threshold value pattern corresponding to the power data pattern selected in the step S7 is selected from the threshold value table 54 c, and a threshold value corresponding to the noise level at that time is selected from the threshold values of respective noise levels included in a selected threshold value pattern and a selected threshold value is preliminarily set as a threshold value for the recognition mode or registration mode. That is, in the learning mode, in accordance with the pattern of the power of the voice inputted from the
microphone 32, the threshold value is variably set in accordance with an amplitude of the noise level. Thus, the learning mode is executed. - When it is detected that the
voice input switch 30 is turned-on in the previous step S1, themicrocomputer 12 determines whether or not the registration mode is set in a step S9. Then, if not the registration mode, the recognition mode will be executed. - In a first step S10 of the recognition mode as shown in FIG. 3B, the
microcomputer 12 starts sampling of the voice input from themicrophone 32 and the noise level from the terminal 42. Then, in a step S11, sampled noise level data, sampled voice parameter data and sampled noise parameter data are respectively stored in the noiselevel data buffer 54 d,voice parameter buffer 54 e and thenoise parameter buffer 54 f. In a step S12, themicrocomputer 12 determines whether or not the noise level inputted at that time is small. - If the noise level is not small, in a step S13, the
microcomputer 12 reads the noise level data of the past frames (for example, 10 frames) from thenoise level buffer 54 d. Then, in a step S14, by calculating a weighted mean value of the noise levels, a threshold value is determined. That is, on the basis of consideration similar to that of the previous equation, a weighted mean value of the noise levels is calculated, and in accordance with a noise level thus obtained, a threshold value corresponding to the noise level is read from the threshold value table 54 c.. However, in order to set a threshold value which is more matched to a situation of the current noise, the weight of the current noise level is made heavier than that of the past noise levels. - Next, in a step S15, the
microcomputer 12 determined whether or not the noise level currently increases by comparing the data of the current noise level with the data of just before frame which is stored in thenoise level buffer 54 b. - If the noise level increases, in a step S16, the
microcomputer 12 calculates a weighted mean value of threshold values of the past frame being stored in thethreshold value buffer 54 f and the threshold value of the current frame evaluated in the previous step S14. At this time, a weighing coefficient of the threshold value of the current frame (for example, “1.0”) is set to be larger than a weighing coefficient of the past threshold value (for example, “0.5”) because it is necessary to set a larger threshold value in corresponding to the increase of the noise level. - If the noise level does not increase, in a step S17, the
microcomputer 12 calculates a weighted mean value by the threshold values of the past frames and the threshold value of the current frame. At this time, in reverse to the step S16, a weighing coefficient of the threshold value of the current frame (for example, “0.5”) is set to be smaller than a weighing coefficient of the past threshold value (for example, “1.0”) because it is necessary to set a smaller threshold value in corresponding to the decrease of the noise level. - In addition, “YES” is determined in the previous step S12, that is, it is determined that the noise level is small, in a step S18, the
microcomputer 12 sets a threshold value on the basis of the power data of the past frames. More specifically, a simple mean value of the voice power of the past frames is calculated by themicrocomputer 12 and stored in the power data 54 i (FIG. 2), but a power approximately one and a half times the power stored in the power data buffer 54 i is set as a threshold value for sampling. - Thus, in the recognition mode, a threshold value for sampling the voice to be recognized is determined in accordance with an amplitude of the noise level inputted to the terminal42 from the
audio portion 14. However, in any cases, the threshold value thus set is stored in the threshold value buffer 54 g. - Next, in a step S19, the
microcomputer 12 determines whether or not the feature parameter of the voice being stored in thevoice parameter buffer 54 e exceeds the threshold value set as described above. If the feature parameter of a given frame exceeds the threshold value, since the frame is a head frame of the voice or word to be recognized, in a next step S20, themicrocomputer 12 loads an address of thevoice parameter buffer 54 e in which the feature parameter of that frame is stored to thehead address register 54 j as the head address. In FIG. 4 example, an address of the frame Fh′ becomes the head address. - Next, in a step S21, the
microcomputer 12 determines whether or not the feature parameter being stored in thevoice parameter buffer 54 e becomes below the threshold value. If the feature parameter of a given frame becomes below the threshold value, since the frame is a tail frame of the voice or word to be recognized, in a next step S22, themicrocomputer 12 loads an address of thevoice parameter buffer 54 e in which the voice feature parameter data of that frame is stored to thetail address register 54 k as the tail address. In FIG. 4 example, an address of the frame Ft′ becomes the tail address. - Thus, in accordance with the threshold value which is set in any one of the steps S16-S18, the feature parameters of the succeeding frames from Fh′ to Ft′ are provisionally sampled as the feature parameters of the voice to be recognized.
- In next steps S23 and S24, the
microcomputer 12 seeks a true head address and a true tail address, respectively because the threshold value previously set is determined in accordance with an amplitude of the noise level; however, if the noise level is large, the threshold value is also large, it is apprehended that a head and a tail of the voice cannot be sampled correctly. Then, in the step S23, by searching a frame in which the voice power was minimum out of the frames before the frame indicated by the head address determined in the step S20, a true head of the voice to be recognized is sought. In FIG. 4 example, an address of the frame Fh becomes the true head address. Similarly, in the step S24, by searching a frame in which the voice power was minimum out of the frames after the frame indicated by the tail address determined in the step S20, a true tail of the voice to be recognized is sought. In FIG. 4 example, an address of the frame Ft becomes the true tail address. - Then, in a step S25, the
microcomputer 12 determines whether or not a time period from the frame indicated by the true head address to the frame indicated by the true tail address is within a proper length, for example, 0.3-1.5 seconds. A value of this time period is set experimentally, and thus, the same may be changed suitably. If the time period is not proper, the process returns to the previous step S10 (FIG. 3B) without execution of the following recognition operation. - If the time period is proper, in a next step S26, the noise level data between the frames respectively corresponding to the true head address and the true tail address are read from the
noise level buffer 54 d, and a simple mean value of the noise level data is calculated. Then, in a step S27, it is determined whether or not the average noise level is below a predetermined value. The reason is that when the average noise level is large, the threshold value is also large, and thus, there is possibility that the voice has not been correctly sampled, and in such a case, in order to prevent malfunction, it is required that the sampling of the voice data is made invalid so as not to recognized voice. Therefore, in a case where “NO” is determined in the step S27, the process returns to the previous step S10 (FIG. 3B) with no operation. - In a case where “YES” is determined in the step S27, the
microcomputer 12 reads the voice parameter data from thevoice parameter buffer 54 e in a succeeding step S28 and reads the noise parameter data from thenoise parameter buffer 54 e in a step S29. Then, in a step S30, the noise parameter data is subtracted from the voice parameter data, and result data is re-stored in thevoice parameter buffer 54 a. Thus, only the feature parameters of the voice inputted to themicrophone 32 can be stored in thevoice parameter buffer 54 e. Then, operations from the step S27 to the step S30 are repeatedly executed for each frame until the tail frame is detected in a step S31. - In addition, as described above, in this embodiment shown, the sound signal from the
audio portion 14 is directly inputted to the terminal 42 as the noise signal. On the other hand, a time when the sound from theaudio portion 14 becomes the ambient noise with respect to themicrophone 32 is a time when the sound actually generated from theloudspeakers microphone 32 as the ambient noise. Therefore, if the noise parameters of the same frames as that of the voice parameters are subtracted from the voice parameters, the both becomes not coincident in time with each other due to the above described time difference. Therefore, in this embodiment shown, in a step S29, the noise parameter data which are delayed are read. That is, the noise parameter data which are delayed from the frames of thevoice parameter buffer 54 e by approximately 6 frames. - In addition, in order to make the voice parameters and the noise parameters be coincident in time with each other by taking the above described delay time into consideration, a delay circuit may be inserted between the terminal42 and the
filter bank 46. - Furthermore, an amplitude of the noise signal directly inputted to the terminal42 is larger than an amplitude of the noise which is generated from the
loudspeakers microphone 32. Therefore, in this embodiment shown, by taking a difference of the levels into consideration, in a step S30, the noise parameters multiplied by α (α is a constant below 1) are subtracted from the voice parameters. - Next, the voice data is compressed in a step S32, and thereafter, a recognition operation is executed in a step S33. That is, it is determined that the produced voice parameter pattern is most similar to any one of a number of reference patterns being set in advance in the reference pattern table 54 a. Then, a similarity S which is an error exceeds a predetermined value, the voice having the feature parameters is finally recognized. However, a threshold value of the similarity S may be changed in accordance with the noise level. More specifically, when the noise level is large, the threshold value of the similarity S which becomes a threshold value of the recognition is set to be small and, when the noise level is small, the threshold value of the similarity S is set to be large. The reason is that when the noise level is large, larger noise may be mixed with the voice from the
microphone 32, and therefore, if the threshold value is strictly a result where almost the voice or words cannot be recognized occurs. Therefore, when the noise level is large, the voice or word having a smaller similarity is recognized. - However, since no feature exists in the recognition operation itself, it is possible to apply a recognition method used in the U.S. Pat. No. 4,239,936 previously cited and etc. Therefore, the recognition method is not specifically described in detail. In addition, the reason why the voice data is data-compressed in the step S32 is increase of a recognition speed, and therefore, if not necessary, such data-compression is also not required.
- Then, if it is detected that the registration mode is set in the step S9 in FIG. 3A, the
microcomputer 12 executes respective steps of the registration mode shown in FIG. 3D-FIG. 3G. Steps S40-S62 of the registration mode are wholly the same as the operations of the steps S10-S32 in the previous recognition mode, and therefore, a duplicate description will be omitted here. However, in a step S63, the voice parameter pattern which is data-compressed in the step S62 is stored in the reference pattern table 54 a (FIG. 2) described previously. - In addition, in the registration mode, when the noise level becomes larger than a predetermined value, the registration of the reference pattern is inhibited (step S57), and a threshold value for sampling the voice parameter data is variably set in accordance with an amplitude of the noise level.
- In addition, in the above described embodiment, it is such constructed that the
microcomputer 12 controls theaudio portion 14 by recognizing the voice from themicrophone 32. However, the present invention is not limited to the automobile stereo of the embodiment and may be arbitrarily applied to a radio, a television set and a broadcasting equipments of a background music in an office. - Furthermore, in the above described embodiment, in order to variably set the threshold value, the power data table54 b and the threshold value table 54 c are used; however, the threshold value may be changed in response to the noise level through calculation for each frame.
- Although the present invention has been described and illustrated in detail, it is clearly understood that the same is by way of illustration and example only and is not to be taken by way of limitation, the spirit and scope of the present invention being limited only by the terms of the appended claims.
Claims (37)
1. A voice recognizing apparatus, comprising:
a microphone for inputting voice;
sampling means for detecting a boundary of a voice signal from said microphone exceeding a threshold value; and
changing means for changing said threshold value in accordance with a level of ambient noise.
2. A voice recognizing apparatus in accordance with , further comprising invalidating means for invalidating said sampling means when a level of said ambient noise is large.
claim 1
3. A voice recognizing apparatus in accordance with , wherein said sampling means samples said voice signal exceeding said threshold value through 2 stages.
claim 2
4. A voice recognizing apparatus in accordance with , further comprising recognizing means for recognizing the voice signal sampled by said sampling means.
claim 1
5. A voice recognizing apparatus in accordance with , wherein said recognizing means includes standard changing means for changing a determination standard of recognition in accordance with a level of said ambient noise.
claim 4
6. A voice recognizing apparatus in accordance with , further comprising registering means for registering a reference pattern on the basis of the sound signal sampled by said sampling means.
claim 1
7. A voice recognizing apparatus in accordance with , further comprising invalidating means for substantially invalidating said registering means when the level of said ambient noise is large.
claim 6
8. A voice recognizing apparatus in accordance with , wherein said changing means includes means for setting said threshold value in accordance with a level of the voice signal from said microphone when a level of said ambient noise is below a predetermined value.
claim 1
9. A voice recognizing apparatus, comprising:
a microphone for inputting voice; and
changing means for changing a threshold value for sampling a voice signal from said microphone in accordance with a level of ambient noise.
10. A voice recognizing apparatus in accordance with , further comprising mode setting means for setting a first mode wherein the voice signal from said microphone is to be processed or a second mode wherein the voice signal from said microphone is not to be processed; and means for setting a threshold value which is utilized in said first mode on the basis of the voice signal from said microphone when said second mode is set.
claim 9
11. A voice recognizing method comprising steps of:
(a) detecting a level of ambient noise;
(b) variably changing a threshold value in accordance with the level of the ambient noise as detected; and
(c) detecting a boundary of a voice signal inputted from a microphone in accordance with the threshold value;
12. A voice recognizing method in accordance with , further comprising a step of (d) recognizing a voice signal as sampled.
claim 11
13. A voice recognizing method in accordance with , further comprising a step of (e) registering a reference pattern on the basis of the voice signal as sampled.
claim 11
12. A voice recognizing method comprising of steps:
(a) detecting a level of ambient noise; and
(b) variably setting threshold value for sampling a voice signal from a microphone in accordance with the level of the ambient noise as detected.
15. A voice recognizing apparatus, comprising:
a microphone to which voice to be recognized is inputted;
first sampling means for sampling a feature parameter of a voice signal from said microphone for each frame with a predetermined time interval;
first converting means for converting said feature parameter sampled by said first sampling means into first feature parameter data;
first memory means for storing said first feature parameter data outputted from said first converting means for a plurality of frames;
first reading means for reading a series of said first feature parameter data exceeding a threshold value from said first memory means;
noise level detecting means for detecting a level of ambient noise; and
threshold value setting means for variably setting said threshold value in accordance with an amplitude of the level of said ambient noise.
16. A voice recognizing apparatus in accordance with , where said threshold value setting means includes means for setting said threshold value on the basis of power data of said voice signal inputted from said microphone when said level of said ambient noise is small.
claim 15
17. A voice recognizing apparatus in accordance with , wherein said threshold value setting means includes weighted mean value calculation means for calculating a weighted mean value of said threshold value.
claim 15
18. A voice recognizing apparatus in accordance with , further comprising second memory means for storing said threshold value set by said threshold value setting means for each frame; wherein said weighted mean value calculation means calculate a weighted mean value one or more threshold values of one or more prior frames read from said second memory means and a threshold value of a current frame.
claim 17
19. A voice recognizing apparatus in accordance with , further comprising detecting means for detecting whether or not said level of said ambient noise increases; wherein said weighted mean value calculation means sets a weight of the threshold value of said current frame to be larger than a weight of the threshold values of said prior frames when said detecting means detects that said level of said ambient noise increase.
claim 18
20. A voice recognizing apparatus in accordance with , wherein said weighted mean value calculation means sets a weight of the threshold value of said current frame to be smaller than a weight of the threshold values of said prior frames when said detecting means does not detect that said level of said ambient noise increases.
claim 19
21. A voice recognizing apparatus in accordance with , further comprising detecting means for detecting whether or not a level of said ambient noise increases; wherein weighted mean value calculation means sets a weight of the threshold value of said current frame to be smaller than a weight of the threshold values of said prior frames when said detecting means does not detect that said levels of said ambient noise increases.
claim 17
22. A voice recognizing apparatus accordance with , further comprising invalidating means for invalidating said first reading means when the level of said ambient noise is more than a predetermined value.
claim 15
23. A voice recognizing apparatus in accordance with , further comprising first and second address storing means for respectively storing a head frame and a tail frame of said first feature parameter data read from said first memory means by said first reading means.
claim 15
24. A voice recognizing apparatus in accordance with , further comprising address determining means for determining a true head frame address and a true tail frame address on the basis of said head frame address and said tail frame address respectively stored in said first and second address storing means; wherein said first reading means reads said first feature parameter data from said first memory means from said true head frame address to id true tail frame address.
claim 23
25. A voice recognizing apparatus in accordance with , wherein said address determining means includes means for determining an address of said first memory means which is prior to said head frame address stored in said first address storing means and stores a minimum value as said true head frame address and an address of said first memory means which is after said head frame address stored in said first address storing means and stores a minimum value as said true tail frame address.
claim 24
26. A voice recognizing apparatus in accordance with , further comprising:
claim 15
second sampling means for sampling a feature parameter of a signal representative of said ambient noise for each frame with said predetermined time interval;
second converting means for converting said feature parameter sampled by said second sampling means into second feature parameter data;
second memory means for storing said second feature parameter data outputted from said second converting means for a plurality of frames;
second reading means for reading said second feature parameter data from an address of said second memory means corresponding to that of said first memory means which is read by said first reading means; and
subtracting means for subtracting said second feature parameter data from said first feature parameter data.
27. A voice recognizing apparatus in accordance with , wherein said second reading means reads said second feature parameter data from an address of said second memory means equal to a frame which is delayed from a frame corresponding to an address of said first memory means which is read by said first reading means.
claim 26
28. A voice recognizing apparatus in accordance with , wherein said subtracting means subtracts said second feature parameter data read from said second memory means which is multiplied by a predetermined constant from said first feature parameter data.
claim 26
29. A voice recognizing apparatus in accordance with , further comparing recognition means for recognizing the voice inputted from said microphone on the basis of a subtraction result by said subtracting means.
claim 26
30. A voice recognizing apparatus in accordance with , further comprising first invalidating means for invalidating substantially said recognition means when said level of said ambient noise is large.
claim 29
31. A voice recognizing apparatus in accordance with , further comprising registration means for registering a feature parameter pattern of the voice inputted from said microphone on the basis of a subtraction result of said subtracting means.
claim 26
32. A voice recognizing apparatus in accordance with , further comprising second invalidating means for invalidating substantially said registration means when said level of said ambient noise is large.
claim 31
33. A voice recognizing apparatus in accordance with or , wherein said ambient noise is generated from a loudspeaker in response to an audio signal from an audio equipment, and said ambient noise level detecting means includes audio signal inputting means for directly inputting said audio signal from said audio equipment.
claim 15
26
34. A voice recognizing apparatus in accordance with , further comprising recognition means for recognizing the voice inputted from said microphone through a comparison of said first feature parameter data read by said first reading means and a reference pattern.
claim 33
35. A voice recognizing apparatus in accordance with , further comprising controlling means for controlling said audio equipment in response to a recognition result of said recognition means.
claim 34
36. A voice recognizing apparatus in accordance with , wherein said threshold value setting means includes means for setting said threshold value in accordance with a level of a voice signal from said microphone when said level of said ambient noise is below a predetermined value.
claim 15
37. A voice recognizing apparatus in accordance with , further comprising mode setting means for setting a first mode wherein the voice signal from said microphone is to be processed or second mode wherein the voice signal from said microphone is not to be processed; and means for setting a threshold value which is utilized in said first mode on the basis of the voice signal from said microphone when said second mode is set.
claim 15
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US08/897,734 US6411928B2 (en) | 1990-02-09 | 1997-07-21 | Apparatus and method for recognizing voice with reduced sensitivity to ambient noise |
Applications Claiming Priority (11)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2-030185 | 1990-02-09 | ||
JP2030185A JP2966460B2 (en) | 1990-02-09 | 1990-02-09 | Voice extraction method and voice recognition device |
JP2-30185 | 1990-02-09 | ||
JP2-278393 | 1990-10-16 | ||
JP2278393A JP2648014B2 (en) | 1990-10-16 | 1990-10-16 | Audio clipping device |
JP2-281020 | 1990-10-18 | ||
JP2281020A JPH04155400A (en) | 1990-10-18 | 1990-10-18 | Voice recognition device |
US65342691A | 1991-02-08 | 1991-02-08 | |
US8039693A | 1993-06-21 | 1993-06-21 | |
US35387894A | 1994-12-12 | 1994-12-12 | |
US08/897,734 US6411928B2 (en) | 1990-02-09 | 1997-07-21 | Apparatus and method for recognizing voice with reduced sensitivity to ambient noise |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US35387894A Continuation | 1990-02-09 | 1994-12-12 |
Publications (2)
Publication Number | Publication Date |
---|---|
US20010029449A1 true US20010029449A1 (en) | 2001-10-11 |
US6411928B2 US6411928B2 (en) | 2002-06-25 |
Family
ID=27549510
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US08/897,734 Expired - Fee Related US6411928B2 (en) | 1990-02-09 | 1997-07-21 | Apparatus and method for recognizing voice with reduced sensitivity to ambient noise |
Country Status (1)
Country | Link |
---|---|
US (1) | US6411928B2 (en) |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040059573A1 (en) * | 2001-02-20 | 2004-03-25 | Hwajin Cheong | Voice command identifier for a voice recognition system |
US6889191B2 (en) * | 2001-12-03 | 2005-05-03 | Scientific-Atlanta, Inc. | Systems and methods for TV navigation with compressed voice-activated commands |
US20080033722A1 (en) * | 2001-07-10 | 2008-02-07 | American Express Travel Related Services Company, Inc. | Method and system for hand geometry recognition biometrics on a fob |
US20080046379A1 (en) * | 2001-07-10 | 2008-02-21 | American Express Travel Related Services Company, Inc. | System and method for proffering multiple biometrics for use with a fob |
US20100177178A1 (en) * | 2009-01-14 | 2010-07-15 | Alan Alexander Burns | Participant audio enhancement system |
US20110161084A1 (en) * | 2009-12-29 | 2011-06-30 | Industrial Technology Research Institute | Apparatus, method and system for generating threshold for utterance verification |
US8738382B1 (en) * | 2005-12-16 | 2014-05-27 | Nvidia Corporation | Audio feedback time shift filter system and method |
US8938081B2 (en) | 2010-07-06 | 2015-01-20 | Dolby Laboratories Licensing Corporation | Telephone enhancements |
EP3125540A1 (en) * | 2015-07-31 | 2017-02-01 | Fujitsu Limited | Information presentation method, information presentation apparatus, and program |
CN106611596A (en) * | 2015-10-22 | 2017-05-03 | 德克萨斯仪器股份有限公司 | Time-based frequency tuning of analog-to-information feature extraction |
US20180011843A1 (en) * | 2016-07-07 | 2018-01-11 | Samsung Electronics Co., Ltd. | Automatic interpretation method and apparatus |
US20190029563A1 (en) * | 2017-07-26 | 2019-01-31 | Intel Corporation | Methods and apparatus for detecting breathing patterns |
US11169773B2 (en) * | 2014-04-01 | 2021-11-09 | TekWear, LLC | Systems, methods, and apparatuses for agricultural data collection, analysis, and management via a mobile device |
US20210382964A1 (en) * | 2017-03-01 | 2021-12-09 | Stmicroelectronics (Grenoble 2) Sas | Method and apparatus for processing a histogram output from a detector sensor |
US11581004B2 (en) | 2020-12-02 | 2023-02-14 | HearUnow, Inc. | Dynamic voice accentuation and reinforcement |
Families Citing this family (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6665645B1 (en) * | 1999-07-28 | 2003-12-16 | Matsushita Electric Industrial Co., Ltd. | Speech recognition apparatus for AV equipment |
US7263074B2 (en) * | 1999-12-09 | 2007-08-28 | Broadcom Corporation | Voice activity detection based on far-end and near-end statistics |
US7444353B1 (en) * | 2000-01-31 | 2008-10-28 | Chen Alexander C | Apparatus for delivering music and information |
US6408277B1 (en) * | 2000-06-21 | 2002-06-18 | Banter Limited | System and method for automatic task prioritization |
US7132766B2 (en) * | 2003-03-25 | 2006-11-07 | Rockwell Automation Technologies, Inc. | Method and apparatus for providing a switching signal in the presence of noise |
US20040195919A1 (en) * | 2003-04-02 | 2004-10-07 | Gasperi Michael Lee | Method and apparatus for providing a switching signal in the presence of noise |
US8005668B2 (en) * | 2004-09-22 | 2011-08-23 | General Motors Llc | Adaptive confidence thresholds in telematics system speech recognition |
US7567165B2 (en) | 2006-10-27 | 2009-07-28 | At&T Intellectual Property, I, L.P. | Methods, devices, and computer program products for providing ambient noise sensitive alerting |
US8094838B2 (en) * | 2007-01-15 | 2012-01-10 | Eastman Kodak Company | Voice command of audio emitting device |
CN101359472B (en) * | 2008-09-26 | 2011-07-20 | 炬力集成电路设计有限公司 | Method for distinguishing voice and apparatus |
CN102044244B (en) * | 2009-10-15 | 2011-11-16 | 华为技术有限公司 | Signal classifying method and device |
Family Cites Families (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CA1116300A (en) | 1977-12-28 | 1982-01-12 | Hiroaki Sakoe | Speech recognition system |
US4696039A (en) * | 1983-10-13 | 1987-09-22 | Texas Instruments Incorporated | Speech analysis/synthesis system with silence suppression |
US4625083A (en) * | 1985-04-02 | 1986-11-25 | Poikela Timo J | Voice operated switch |
US4918732A (en) * | 1986-01-06 | 1990-04-17 | Motorola, Inc. | Frame comparison method for word recognition in high noise environments |
GB8608289D0 (en) * | 1986-04-04 | 1986-05-08 | Pa Consulting Services | Noise compensation in speech recognition |
JPH0748695B2 (en) * | 1986-05-23 | 1995-05-24 | 株式会社日立製作所 | Speech coding system |
US4829578A (en) * | 1986-10-02 | 1989-05-09 | Dragon Systems, Inc. | Speech detection and recognition apparatus for use with background noise of varying levels |
US5212764A (en) * | 1989-04-19 | 1993-05-18 | Ricoh Company, Ltd. | Noise eliminating apparatus and speech recognition apparatus using the same |
US5459814A (en) * | 1993-03-26 | 1995-10-17 | Hughes Aircraft Company | Voice activity detector for speech signals in variable background noise |
-
1997
- 1997-07-21 US US08/897,734 patent/US6411928B2/en not_active Expired - Fee Related
Cited By (27)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040059573A1 (en) * | 2001-02-20 | 2004-03-25 | Hwajin Cheong | Voice command identifier for a voice recognition system |
US20080033722A1 (en) * | 2001-07-10 | 2008-02-07 | American Express Travel Related Services Company, Inc. | Method and system for hand geometry recognition biometrics on a fob |
US20080046379A1 (en) * | 2001-07-10 | 2008-02-21 | American Express Travel Related Services Company, Inc. | System and method for proffering multiple biometrics for use with a fob |
US9336634B2 (en) | 2001-07-10 | 2016-05-10 | Chartoleaux Kg Limited Liability Company | Hand geometry biometrics on a payment device |
US8284025B2 (en) * | 2001-07-10 | 2012-10-09 | Xatra Fund Mx, Llc | Method and system for auditory recognition biometrics on a FOB |
US8289136B2 (en) * | 2001-07-10 | 2012-10-16 | Xatra Fund Mx, Llc | Hand geometry biometrics on a payment device |
US20140343951A1 (en) * | 2001-12-03 | 2014-11-20 | Cisco Technology, Inc. | Simplified Decoding of Voice Commands Using Control Planes |
US6889191B2 (en) * | 2001-12-03 | 2005-05-03 | Scientific-Atlanta, Inc. | Systems and methods for TV navigation with compressed voice-activated commands |
US20050144009A1 (en) * | 2001-12-03 | 2005-06-30 | Rodriguez Arturo A. | Systems and methods for TV navigation with compressed voice-activated commands |
US7321857B2 (en) * | 2001-12-03 | 2008-01-22 | Scientific-Atlanta, Inc. | Systems and methods for TV navigation with compressed voice-activated commands |
US9495969B2 (en) * | 2001-12-03 | 2016-11-15 | Cisco Technology, Inc. | Simplified decoding of voice commands using control planes |
US8738382B1 (en) * | 2005-12-16 | 2014-05-27 | Nvidia Corporation | Audio feedback time shift filter system and method |
US20100177178A1 (en) * | 2009-01-14 | 2010-07-15 | Alan Alexander Burns | Participant audio enhancement system |
US8154588B2 (en) * | 2009-01-14 | 2012-04-10 | Alan Alexander Burns | Participant audio enhancement system |
US20110161084A1 (en) * | 2009-12-29 | 2011-06-30 | Industrial Technology Research Institute | Apparatus, method and system for generating threshold for utterance verification |
US8938081B2 (en) | 2010-07-06 | 2015-01-20 | Dolby Laboratories Licensing Corporation | Telephone enhancements |
US11169773B2 (en) * | 2014-04-01 | 2021-11-09 | TekWear, LLC | Systems, methods, and apparatuses for agricultural data collection, analysis, and management via a mobile device |
EP3125540A1 (en) * | 2015-07-31 | 2017-02-01 | Fujitsu Limited | Information presentation method, information presentation apparatus, and program |
US11302306B2 (en) * | 2015-10-22 | 2022-04-12 | Texas Instruments Incorporated | Time-based frequency tuning of analog-to-information feature extraction |
CN106611596A (en) * | 2015-10-22 | 2017-05-03 | 德克萨斯仪器股份有限公司 | Time-based frequency tuning of analog-to-information feature extraction |
US11605372B2 (en) | 2015-10-22 | 2023-03-14 | Texas Instruments Incorporated | Time-based frequency tuning of analog-to-information feature extraction |
US20180011843A1 (en) * | 2016-07-07 | 2018-01-11 | Samsung Electronics Co., Ltd. | Automatic interpretation method and apparatus |
US10867136B2 (en) * | 2016-07-07 | 2020-12-15 | Samsung Electronics Co., Ltd. | Automatic interpretation method and apparatus |
US20210382964A1 (en) * | 2017-03-01 | 2021-12-09 | Stmicroelectronics (Grenoble 2) Sas | Method and apparatus for processing a histogram output from a detector sensor |
US11797645B2 (en) * | 2017-03-01 | 2023-10-24 | Stmicroelectronics (Research & Development) Limited | Method and apparatus for processing a histogram output from a detector sensor |
US20190029563A1 (en) * | 2017-07-26 | 2019-01-31 | Intel Corporation | Methods and apparatus for detecting breathing patterns |
US11581004B2 (en) | 2020-12-02 | 2023-02-14 | HearUnow, Inc. | Dynamic voice accentuation and reinforcement |
Also Published As
Publication number | Publication date |
---|---|
US6411928B2 (en) | 2002-06-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US6411928B2 (en) | Apparatus and method for recognizing voice with reduced sensitivity to ambient noise | |
JP2654503B2 (en) | Wireless terminal | |
US4918735A (en) | Speech recognition apparatus for recognizing the category of an input speech pattern | |
US5146504A (en) | Speech selective automatic gain control | |
EP0077194B1 (en) | Speech recognition system | |
EP0459382B1 (en) | Speech signal processing apparatus for detecting a speech signal from a noisy speech signal | |
US4628526A (en) | Method and system for matching the sound output of a loudspeaker to the ambient noise level | |
US4455676A (en) | Speech processing system including an amplitude level control circuit for digital processing | |
CZ67896A3 (en) | Voice detector | |
EP0487307B1 (en) | Method and system for speech recognition without noise interference | |
EP0614169B1 (en) | Voice signal processing device | |
JP2990051B2 (en) | Voice recognition device | |
JPH08250944A (en) | Automatic sound volume control method and device executing this method | |
JP3410789B2 (en) | Voice recognition device | |
JP2648014B2 (en) | Audio clipping device | |
JP2975808B2 (en) | Voice recognition device | |
JPH0635498A (en) | Device and method for speech recognition | |
JPH0573090A (en) | Speech recognizing method | |
JPH01200294A (en) | Sound recognizing device | |
JPS6272214A (en) | Automatic sound volume adjusting device in on-vehicle audio equipment | |
JP2000075900A (en) | Voice analyzing device | |
JPH06208393A (en) | Voice recognizing device | |
JPS59185394A (en) | Voice recognition equipment | |
JPH04340598A (en) | Voice recognition device | |
JPH04369697A (en) | Voice recognition device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
FPAY | Fee payment |
Year of fee payment: 4 |
|
FEPP | Fee payment procedure |
Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
REMI | Maintenance fee reminder mailed | ||
LAPS | Lapse for failure to pay maintenance fees | ||
STCH | Information on status: patent discontinuation |
Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362 |
|
FP | Lapsed due to failure to pay maintenance fee |
Effective date: 20100625 |