KR20150048471A - Method for voice activity detection and communication device implementing the same - Google Patents
Method for voice activity detection and communication device implementing the same Download PDFInfo
- Publication number
- KR20150048471A KR20150048471A KR1020130128584A KR20130128584A KR20150048471A KR 20150048471 A KR20150048471 A KR 20150048471A KR 1020130128584 A KR1020130128584 A KR 1020130128584A KR 20130128584 A KR20130128584 A KR 20130128584A KR 20150048471 A KR20150048471 A KR 20150048471A
- Authority
- KR
- South Korea
- Prior art keywords
- microphone
- voice activity
- signal power
- communication device
- ratio
- Prior art date
Links
- 230000000694 effects Effects 0.000 title claims abstract description 58
- 238000004891 communication Methods 0.000 title claims abstract description 40
- 238000000034 method Methods 0.000 title claims abstract description 21
- 238000001514 detection method Methods 0.000 title claims abstract description 16
- 238000009499 grossing Methods 0.000 claims description 6
- 206010002953 Aphonia Diseases 0.000 claims description 3
- 238000010586 diagram Methods 0.000 description 6
- 230000007423 decrease Effects 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 230000001419 dependent effect Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
- G10L25/84—Detection of presence or absence of voice signals for discriminating voice from noise
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R1/00—Details of transducers, loudspeakers or microphones
- H04R1/20—Arrangements for obtaining desired frequency or directional characteristics
- H04R1/22—Arrangements for obtaining desired frequency or directional characteristics for obtaining desired frequency characteristic only
- H04R1/222—Arrangements for obtaining desired frequency or directional characteristics for obtaining desired frequency characteristic only for microphones
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/225—Feedback of the input speech
Landscapes
- Engineering & Computer Science (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Computational Linguistics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Otolaryngology (AREA)
- Telephone Function (AREA)
Abstract
The present invention relates to a voice activity detection method for estimating and removing noise in a communication apparatus using two microphones and a communication apparatus employing the method. More particularly, the present invention relates to a voice activity detection method A method for detecting a voice activity in a communication device including a second microphone relatively far from a mouth, the method comprising: calculating a frequency component of an input signal input from the first microphone and the second microphone through a Fourier transform; Calculating a frequency signal power through a frequency component of each microphone and calculating a ratio of both signal powers by comparing and analyzing the frequency signal power of the first microphone and the frequency signal power of the second microphone; And determining the presence or absence of a voice activity based on the calculated ratio of the positive and negative signal powers. According to the present invention, the presence or absence of the voice activity is judged through the ratio value of the signal power from the two microphones, so that it is less affected by the magnitude of the signal coming in the two microphones.
Description
BACKGROUND OF THE
A voice activity detection (VAD) method using one microphone is widely used to reduce background noise in a communication device such as a mobile phone and to increase the channel capacity or reduce battery consumption have.
For example, in some code division multiple access (CDMA) systems, VAD is used to minimize the effective radio spectrum used, thereby providing more system capacity. In addition, GSM communication systems are using VADs to reduce common-channel interference and reduce battery consumption in subscriber units.
However, these typical single-microphone VAD systems analyze acoustic information received by a single microphone, which limits its capacity greatly. Specifically, where the signals have a low signal-to-noise ratio (SNR), and where background noise changes rapidly, performance limits of single-microphone VAD systems are revealed.
In order to solve this problem, there have been proposed devices for removing noise using two microphones. For example, Korean Patent Publication No. 10-2004-0101373 discloses a communication device comprising one omni-directional microphone, a one-way microphone, and one or more skin surface microphone sensors in contact with the user's skin, And a voice activity sensor for outputting a control signal by processing a voice sensor activity signal of a skin surface microphone.
However, such a configuration is difficult to apply generally because the hardware configuration is complicated and the algorithm is also dependent on such hardware configuration.
SUMMARY OF THE INVENTION It is an object of the present invention to provide a voice activity detection method which is generally applicable to a communication apparatus using two microphones and a communication apparatus employing the method.
According to another aspect of the present invention, there is provided a method for detecting a voice activity in a communication device including a first microphone and a second microphone relatively far from the mouth of a speaker as compared to the first microphone, Calculating a frequency component of an input signal input from the first microphone and the second microphone through a Fourier transform; Calculating a frequency signal power through a frequency component of each microphone and calculating a ratio of both signal powers by comparing and analyzing the frequency signal power of the first microphone and the frequency signal power of the second microphone; And determining the presence or absence of a voice activity based on the calculated ratio of the positive and negative signal powers.
As an example, when the frequency signal power of the second microphone is relatively larger than the frequency signal power of the first microphone, the ratio A (1) of the positive signal power in the 1 <
And if A (l) is relatively larger than the value between theAs another example, when the frequency signal power of the first microphone is relatively higher than the frequency signal power of the second microphone, the ratio B (l)
And if B (l) is relatively smaller than the threshold value between 0 and 1 in the 1 < th > frame, it is estimated as a noise period.If the ratio of the two signal powers is not estimated as a noise period through comparison analysis with the threshold value, it is determined that there is a voice activity and a signal indicating that there is a voice activity is output.
Here, in calculating the ratio of both signal powers, smoothing can be performed on the ratio of the power of both signals in order to protect the voice activity interval.
Meanwhile, the communication apparatus according to the present invention is characterized by including a voice activity detector for determining the presence or absence of voice activity by the above-described voice activity detection method.
Therefore, the voice activity detection method and the communication apparatus adopting the method of the present invention are applied to a communication apparatus provided with two microphones close to each other. When the noise signal is relatively far from the mouth of the speaker, The noise signal is input with a signal power of almost the same level. On the contrary, a microphone close to a speaker's mouth receives a voice of a speaker with a signal power of a level higher than that of a microphone far away, It is possible to detect the voice activity more accurately.
In addition, according to the present invention, since voice activity is detected using only two microphones, there is an advantage that a voice activity sensing method that can be generally applied to a communication device using two microphones and that is efficient can be implemented.
1 is a block diagram showing an internal configuration of a communication device to which the voice activity detection method of the present invention is applied.
2 is a view showing an example of arrangement of a microphone and a speaker in a communication device having two microphones.
3 is a view for explaining an operation of a voice activity detection method according to a preferred embodiment of the present invention.
4 is a diagram showing an example of the frequency signal power in the l < th >
Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the drawings.
1 is a block diagram illustrating an internal configuration of a communication device to which the voice activity detection method of the present invention is applied.
As shown in FIG. 1, the voice activity detection method of the present invention is applied to a communication device including at least two
Preferably, the
The analog signals input to the
The
The
The
The voice signal to be decoded is converted into an analog signal through a digital-to-analog converter (22), then amplified to an appropriate level and outputted through the speaker (21), thereby making a conversation between the parties.
2 is a diagram showing an example of arrangement of a microphone and a speaker in a communication device having two microphones. As shown in FIG. 2, the
FIG. 3 is a diagram for explaining an operation of a voice activity detection method according to a preferred embodiment of the present invention, and FIG. 4 is a diagram illustrating an example of a frequency signal power in a l-th frame. The operation of FIG. 3 is preferably performed for each frame, but the present invention is not limited thereto.
First, a voice signal input from the two
The
If the input voice signal of the first microphone is d (n) and the input voice signal of the second microphone is x (n), d (k) by the Fourier transform of d (n) x (k) by the Fourier transform of n (n) is derived by the following equation (1).
Here, 'N' is the number of samples of the block in the previous predetermined period including the current sample in the Fourier transform, and 0? N? N-1, 0? K? N-1.
Then, the
As described above, when two or
On the contrary, the voice of the caller is input to the
In the present invention, it is possible to estimate the noise interval using this phenomenon.
In the case of D (k) < X (k) at the D (k) and X (k) values derived from Equation (2) (K) and X (k) of the two
Where N is the number of samples in one block and l is the frame index.
In general, the noise is input to the microphone at a relatively long distance from the mouth of the speaker, so that the power of the frequency signal of almost the same level can be input to the two
Accordingly, the
The threshold value Thr_A is an optimal set value obtained from the results of repeated experiments, and the present invention is not limited to this, and it is obvious that the threshold value Thr_A may vary depending on the structure of the communication device and the characteristics of the microphone.
(K) > X (k) in the values of D (k) and X (k) derived from Equation 2 as a reverse concept of the above- If the frequency signal power of the
Where N is the number of samples in one block and l is the frame index.
In the present embodiment, since the speaker's voice is input at a relatively large level to the
Therefore, B (l) in the 1 < th > frame can be estimated as a noise period when B (l) < Thr_B is satisfied (S310-2). (S400) Here, the threshold value Thr_B is a value between 0 and 1.
Here, the threshold value Thr_B is also an optimum set value obtained from the result of repeated experiments, and the present invention is not limited thereto, and it is obvious that the threshold value Thr_B may vary depending on the structure of the communication device and the characteristics of the microphone.
That is, as shown in FIG. 4, the value of B (1) indicates a large value in a period in which there is no voice activity and a relatively small value in a period in which a voice activity exists. So that the noise is removed.
Through the above-described two embodiments, the
According to a preferred embodiment of the present invention, in order to protect the voice activity interval when estimating the noise interval by the
For example, in the case of A (l)
A (1) = A 1 * A (1) + (1 - 留 1) * A (1 - 1)
A (1) =? 2 * A (1) + (1 -? 2) * A (1 - 1)
Here, 0 <
When smoothing is performed as described above, A (1) increases slowly and decreases rapidly.
Conversely, in the case of B (l)
B (l) = β1 * B (l) + (1-β1) * B (l-1)
B (l) =? 2 * B (l) + (1 -? 2) * B (l? 1)
Here, 1> β1> β2> 0.
When smoothing is performed as described above, B (l) increases relatively faster than A (l) and slowly decreases.
Here, the smoothing method can be variously performed through known techniques and the like, and is not limited to the examples shown in the present invention.
Thereafter, if the ratio of both signal powers in the
The signal and the voice activity signal estimated by the noise period are transmitted to the
It will be apparent to those skilled in the art that various modifications and variations can be made in the present invention without departing from the spirit or scope of the invention. Therefore, the technical scope of the present invention should not be limited to the contents described in the detailed description of the specification, but should be defined by the claims.
11: first microphone
12: second microphone
15: Voice Activity Detector
16: Noise canceling
Claims (7)
Calculating a frequency component of an input signal input from the first microphone and the second microphone through a Fourier transform;
Calculating a frequency signal power through a frequency component of each microphone and calculating a ratio of both signal powers by comparing and analyzing the frequency signal power of the first microphone and the frequency signal power of the second microphone; And
And determining whether there is a voice activity based on the ratio of the calculated positive signal power.
If the frequency signal power of the second microphone is relatively larger than the frequency signal power of the first microphone, the ratio A (1) of the positive signal power in the 1 <
Lt; / RTI >
and if A (1) is relatively larger than a value between a threshold value of 0 and 1 in a 1 < th > frame, a noise interval is estimated.
When the frequency signal power of the first microphone is relatively higher than the frequency signal power of the second microphone, the ratio B (l)
Lt; / RTI >
if the B (l) is smaller than a value between the thresholds 0 and 1 in the 1 < th > frame, it is estimated to be a noise period.
And if the ratio of both signal powers is not estimated as a noise period through comparison analysis with the threshold value, it is determined that there is a voice activity and a signal indicating that there is a voice activity is outputted.
Wherein smoothing is performed on the ratio of the positive and negative signal powers in order to protect the voice activity period in calculating the ratio of the positive and negative signal powers.
Wherein the first microphone is located at a lower end of the communication device and the second microphone is located at an upper end of the communication device.
The communication device according to any one of claims 1 to 6, further comprising a voice activity detector for determining the presence or absence of voice activity by the voice activity detection method of any one of claims 1 to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020130128584A KR20150048471A (en) | 2013-10-28 | 2013-10-28 | Method for voice activity detection and communication device implementing the same |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020130128584A KR20150048471A (en) | 2013-10-28 | 2013-10-28 | Method for voice activity detection and communication device implementing the same |
Publications (1)
Publication Number | Publication Date |
---|---|
KR20150048471A true KR20150048471A (en) | 2015-05-07 |
Family
ID=53386960
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
KR1020130128584A KR20150048471A (en) | 2013-10-28 | 2013-10-28 | Method for voice activity detection and communication device implementing the same |
Country Status (1)
Country | Link |
---|---|
KR (1) | KR20150048471A (en) |
-
2013
- 2013-10-28 KR KR1020130128584A patent/KR20150048471A/en not_active Application Discontinuation
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR102313894B1 (en) | Method and apparatus for wind noise detection | |
US6782363B2 (en) | Method and apparatus for performing real-time endpoint detection in automatic speech recognition | |
US9467779B2 (en) | Microphone partial occlusion detector | |
JP5952434B2 (en) | Speech enhancement method and apparatus applied to mobile phone | |
US9264804B2 (en) | Noise suppressing method and a noise suppressor for applying the noise suppressing method | |
KR101852892B1 (en) | Voice recognition method, voice recognition device, and electronic device | |
KR101210313B1 (en) | System and method for utilizing inter?microphone level differences for speech enhancement | |
KR101444100B1 (en) | Noise cancelling method and apparatus from the mixed sound | |
US9100756B2 (en) | Microphone occlusion detector | |
JP6156012B2 (en) | Voice processing apparatus and computer program for voice processing | |
KR20130085421A (en) | Systems, methods, and apparatus for voice activity detection | |
KR20120080409A (en) | Apparatus and method for estimating noise level by noise section discrimination | |
JP5870476B2 (en) | Noise estimation device, noise estimation method, and noise estimation program | |
KR20110090940A (en) | Audio source proximity estimation using sensor array for noise reduction | |
EP2245865A1 (en) | Signaling microphone covering to the user | |
JP4816711B2 (en) | Call voice processing apparatus and call voice processing method | |
JP2007003702A (en) | Noise eliminator, communication terminal, and noise eliminating method | |
KR101961998B1 (en) | Reducing instantaneous wind noise | |
US11375066B2 (en) | Echo suppression device, echo suppression method, and echo suppression program | |
KR101396873B1 (en) | Method and apparatus for noise reduction in a communication device having two microphones | |
US11664003B2 (en) | Method for reducing noise, storage medium, chip and electronic equipment | |
CN110556128B (en) | Voice activity detection method and device and computer readable storage medium | |
KR101451844B1 (en) | Method for voice activity detection and communication device implementing the same | |
CN106997768A (en) | A kind of computational methods, device and the electronic equipment of voice probability of occurrence | |
KR20150048471A (en) | Method for voice activity detection and communication device implementing the same |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
A201 | Request for examination | ||
E902 | Notification of reason for refusal | ||
E601 | Decision to refuse application |