KR20150048471A

KR20150048471A - Method for voice activity detection and communication device implementing the same

Info

Publication number: KR20150048471A
Application number: KR1020130128584A
Authority: KR
Inventors: 문석용; 조정권
Original assignee: (주)인피니티텔레콤; 주식회사 시그테크
Priority date: 2013-10-28
Filing date: 2013-10-28
Publication date: 2015-05-07

Abstract

The present invention relates to a voice activity detection method for estimating and removing noise in a communication apparatus using two microphones and a communication apparatus employing the method. More particularly, the present invention relates to a voice activity detection method A method for detecting a voice activity in a communication device including a second microphone relatively far from a mouth, the method comprising: calculating a frequency component of an input signal input from the first microphone and the second microphone through a Fourier transform; Calculating a frequency signal power through a frequency component of each microphone and calculating a ratio of both signal powers by comparing and analyzing the frequency signal power of the first microphone and the frequency signal power of the second microphone; And determining the presence or absence of a voice activity based on the calculated ratio of the positive and negative signal powers. According to the present invention, the presence or absence of the voice activity is judged through the ratio value of the signal power from the two microphones, so that it is less affected by the magnitude of the signal coming in the two microphones.

Description

TECHNICAL FIELD [0001] The present invention relates to a voice activity detection method and a communication device employing the method,

BACKGROUND OF THE INVENTION 1. Field of the Invention [0001] The present invention relates to a voice activity detection (VAD) method and a communication apparatus employing the method, and more particularly to a voice activity detection method for estimating and removing noise in a communication device using two microphones, And a communication apparatus adopting the method.

A voice activity detection (VAD) method using one microphone is widely used to reduce background noise in a communication device such as a mobile phone and to increase the channel capacity or reduce battery consumption have.

For example, in some code division multiple access (CDMA) systems, VAD is used to minimize the effective radio spectrum used, thereby providing more system capacity. In addition, GSM communication systems are using VADs to reduce common-channel interference and reduce battery consumption in subscriber units.

However, these typical single-microphone VAD systems analyze acoustic information received by a single microphone, which limits its capacity greatly. Specifically, where the signals have a low signal-to-noise ratio (SNR), and where background noise changes rapidly, performance limits of single-microphone VAD systems are revealed.

In order to solve this problem, there have been proposed devices for removing noise using two microphones. For example, Korean Patent Publication No. 10-2004-0101373 discloses a communication device comprising one omni-directional microphone, a one-way microphone, and one or more skin surface microphone sensors in contact with the user's skin, And a voice activity sensor for outputting a control signal by processing a voice sensor activity signal of a skin surface microphone.

However, such a configuration is difficult to apply generally because the hardware configuration is complicated and the algorithm is also dependent on such hardware configuration.

Korean Patent Publication No. 10-2004-0101373

SUMMARY OF THE INVENTION It is an object of the present invention to provide a voice activity detection method which is generally applicable to a communication apparatus using two microphones and a communication apparatus employing the method.

According to another aspect of the present invention, there is provided a method for detecting a voice activity in a communication device including a first microphone and a second microphone relatively far from the mouth of a speaker as compared to the first microphone, Calculating a frequency component of an input signal input from the first microphone and the second microphone through a Fourier transform; Calculating a frequency signal power through a frequency component of each microphone and calculating a ratio of both signal powers by comparing and analyzing the frequency signal power of the first microphone and the frequency signal power of the second microphone; And determining the presence or absence of a voice activity based on the calculated ratio of the positive and negative signal powers.

As an example, when the frequency signal power of the second microphone is relatively larger than the frequency signal power of the first microphone, the ratio A (1) of the positive signal power in the 1 <

And if A (l) is relatively larger than the value between the

thresholds

0 and 1 in the 1 < th > frame, it is estimated as a noise period.

As another example, when the frequency signal power of the first microphone is relatively higher than the frequency signal power of the second microphone, the ratio B (l)

And if B (l) is relatively smaller than the threshold value between 0 and 1 in the 1 < th > frame, it is estimated as a noise period.

If the ratio of the two signal powers is not estimated as a noise period through comparison analysis with the threshold value, it is determined that there is a voice activity and a signal indicating that there is a voice activity is output.

Here, in calculating the ratio of both signal powers, smoothing can be performed on the ratio of the power of both signals in order to protect the voice activity interval.

Meanwhile, the communication apparatus according to the present invention is characterized by including a voice activity detector for determining the presence or absence of voice activity by the above-described voice activity detection method.

Therefore, the voice activity detection method and the communication apparatus adopting the method of the present invention are applied to a communication apparatus provided with two microphones close to each other. When the noise signal is relatively far from the mouth of the speaker, The noise signal is input with a signal power of almost the same level. On the contrary, a microphone close to a speaker's mouth receives a voice of a speaker with a signal power of a level higher than that of a microphone far away, It is possible to detect the voice activity more accurately.

In addition, according to the present invention, since voice activity is detected using only two microphones, there is an advantage that a voice activity sensing method that can be generally applied to a communication device using two microphones and that is efficient can be implemented.

1 is a block diagram showing an internal configuration of a communication device to which the voice activity detection method of the present invention is applied.
2 is a view showing an example of arrangement of a microphone and a speaker in a communication device having two microphones.
3 is a view for explaining an operation of a voice activity detection method according to a preferred embodiment of the present invention.
4 is a diagram showing an example of the frequency signal power in the l < th >

Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the drawings.

1 is a block diagram illustrating an internal configuration of a communication device to which the voice activity detection method of the present invention is applied.

As shown in FIG. 1, the voice activity detection method of the present invention is applied to a communication device including at least two microphones 11 and 12. The first microphone (main microphone) 11 is located close to the mouth of the speaker and the second microphone (sub microphone) 12 is located relatively far from the mouth of the speaker It is located off.

Preferably, the first microphone 11 is located at the lower end of the communication device, and the second microphone 12 is located at the upper end of the communication device.

The analog signals input to the respective microphones 11 and 12 are amplified to an appropriate size and converted into digital signals by analog to digital converters 13 and 14, To the voice activity detector (15).

The voice activity detector 15 compares and analyzes the frequency signal power of the first microphone 11 and the frequency signal power of the second microphone 12 to calculate a ratio of the positive and negative signal powers, It is possible to estimate a signal other than the voice activity in the communication device as noise and remove the noise.

The noise eliminator 16 removes ambient noise using a signal indicating the presence or absence of voice activity from the voice activity detector 15 and a digital signal obtained from each microphone 11 and 12, .

The vocoder 31 encodes the noise canceled signal from the noise removing unit 16 and transmits the encoded signal to the calling party through the communication interface 41 and decodes the voice of the calling party transmitted through the communication interface 41 do.

The voice signal to be decoded is converted into an analog signal through a digital-to-analog converter (22), then amplified to an appropriate level and outputted through the speaker (21), thereby making a conversation between the parties.

2 is a diagram showing an example of arrangement of a microphone and a speaker in a communication device having two microphones. As shown in FIG. 2, the first microphone 11 is located at the lower end of the communication device, which is near the mouth of the speaker. The second microphone 12 is located at the lower end of the communication device, For example, at the top of the communication device. The speaker 21 is located at the top of the communication device, which is a position near the ear of the speaker.

FIG. 3 is a diagram for explaining an operation of a voice activity detection method according to a preferred embodiment of the present invention, and FIG. 4 is a diagram illustrating an example of a frequency signal power in a l-th frame. The operation of FIG. 3 is preferably performed for each frame, but the present invention is not limited thereto.

First, a voice signal input from the two microphones 11 and 12 is converted into a digital signal by the analog-digital converters 13 and 14, and the digital signal is input to the voice activity detector 15.

The voice activity detector 15 performs a Fourier transform (FFT) on each voice signal so as to grasp a frequency component of the voice signal of the first microphone 11 and the voice signal of the second microphone 12, (S100).

If the input voice signal of the first microphone is d (n) and the input voice signal of the second microphone is x (n), d (k) by the Fourier transform of d (n) x (k) by the Fourier transform of n (n) is derived by the following equation (1).

Here, 'N' is the number of samples of the block in the previous predetermined period including the current sample in the Fourier transform, and 0? N? N-1, 0? K? N-1.

Then, the voice activity detector 15 calculates the frequency signal powers of the two microphones 11 and 12, and calculates the frequency signal powers of the k-th frequency components d (k) and x (k) Let D (k) and X (k) denote D (k) and X (k), respectively.

As described above, when two or more microphones 11 and 12 are located close to each other in a communication device including a smartphone, when the noise signal is relatively far from the mouth of the speaker, A noise signal is input with a signal power of almost the same level.

On the contrary, the voice of the caller is input to the first microphone 11 located near the mouth of the speaker with a frequency signal power of a level higher than that of the second microphone 12. [

In the present invention, it is possible to estimate the noise interval using this phenomenon.

In the case of D (k) < X (k) at the D (k) and X (k) values derived from Equation (2) (K) and X (k) of the two microphones 11 and 12 in the 1 < th > frame when the frequency signal power of the second microphone 12 is greater than the frequency signal power of the first microphone 11. [ k) is derived by the following equation (3).

Where N is the number of samples in one block and l is the frame index.

In general, the noise is input to the microphone at a relatively long distance from the mouth of the speaker, so that the power of the frequency signal of almost the same level can be input to the two microphones 11 and 12. The value of A (l) 1 ".

Accordingly, the voice activity detector 15 according to the present embodiment estimates the noise interval by comparing the ratio A (1) of D (k) and X (k) derived from Equation 3 with a threshold value (S310-1) when A (l) > Thr_A is satisfied in the lth frame (S400). Here, the threshold value Thr_A is a value between 0 and 1.

The threshold value Thr_A is an optimal set value obtained from the results of repeated experiments, and the present invention is not limited to this, and it is obvious that the threshold value Thr_A may vary depending on the structure of the communication device and the characteristics of the microphone.

(K) > X (k) in the values of D (k) and X (k) derived from Equation 2 as a reverse concept of the above- If the frequency signal power of the first microphone 11 is greater than the frequency signal power of the second microphone 12, the power signals D (k), D (k) of the two microphones 11, The ratio B (1) of X (k) is derived by the following equation (4).

Where N is the number of samples in one block and l is the frame index.

In the present embodiment, since the speaker's voice is input at a relatively large level to the first microphone 11 than the second microphone 12, B (l) at this time is a value of "1" A (l).

Therefore, B (l) in the 1 < th > frame can be estimated as a noise period when B (l) < Thr_B is satisfied (S310-2). (S400) Here, the threshold value Thr_B is a value between 0 and 1.

Here, the threshold value Thr_B is also an optimum set value obtained from the result of repeated experiments, and the present invention is not limited thereto, and it is obvious that the threshold value Thr_B may vary depending on the structure of the communication device and the characteristics of the microphone.

That is, as shown in FIG. 4, the value of B (1) indicates a large value in a period in which there is no voice activity and a relatively small value in a period in which a voice activity exists. So that the noise is removed.

Through the above-described two embodiments, the voice activity detector 15 distinguishes between the voice activity period and the noise period, and only the noise signal can be extracted and removed.

According to a preferred embodiment of the present invention, in order to protect the voice activity interval when estimating the noise interval by the voice activity detector 15, A (l) and B (l) and smoothing is performed on each of the first and second images.

For example, in the case of A (l)

A (1) = A 1 * A (1) + (1 - 留 1) * A (1 - 1)

A (1) =? 2 * A (1) + (1 -? 2) * A (1 - 1)

Here, 0 < alpha 1 < alpha 2 < 1.

When smoothing is performed as described above, A (1) increases slowly and decreases rapidly.

Conversely, in the case of B (l)

B (l) = β1 * B (l) + (1-β1) * B (l-1)

B (l) =? 2 * B (l) + (1 -? 2) * B (l? 1)

Here, 1> β1> β2> 0.

When smoothing is performed as described above, B (l) increases relatively faster than A (l) and slowly decreases.

Here, the smoothing method can be variously performed through known techniques and the like, and is not limited to the examples shown in the present invention.

Thereafter, if the ratio of both signal powers in the voice activity detector 15 is not estimated as a noise period through comparison analysis with the threshold value, it is determined that there is voice activity and a signal indicating that there is voice activity is output (S500 )

The signal and the voice activity signal estimated by the noise period are transmitted to the noise eliminator 16 and are inputted from the microphones 11 and 12 on the basis of the signal inputted to the noise controller 16, The noise is removed by selectively editing the signal.

It will be apparent to those skilled in the art that various modifications and variations can be made in the present invention without departing from the spirit or scope of the invention. Therefore, the technical scope of the present invention should not be limited to the contents described in the detailed description of the specification, but should be defined by the claims.

11: first microphone
12: second microphone
15: Voice Activity Detector
16: Noise canceling

Claims

A method for detecting a voice activity in a communication device comprising a first microphone and a second microphone relatively far from the mouth of the speaker as compared to the first microphone,
Calculating a frequency component of an input signal input from the first microphone and the second microphone through a Fourier transform;
Calculating a frequency signal power through a frequency component of each microphone and calculating a ratio of both signal powers by comparing and analyzing the frequency signal power of the first microphone and the frequency signal power of the second microphone; And
And determining whether there is a voice activity based on the ratio of the calculated positive signal power.

The method according to claim 1,
If the frequency signal power of the second microphone is relatively larger than the frequency signal power of the first microphone, the ratio A (1) of the positive signal power in the 1 <

Lt; / RTI >
and if A (1) is relatively larger than a value between a threshold value of 0 and 1 in a 1 < th > frame, a noise interval is estimated.

The method according to claim 1,
When the frequency signal power of the first microphone is relatively higher than the frequency signal power of the second microphone, the ratio B (l)

Lt; / RTI >
if the B (l) is smaller than a value between the thresholds 0 and 1 in the 1 < th > frame, it is estimated to be a noise period.

The method according to claim 2 or 3,
And if the ratio of both signal powers is not estimated as a noise period through comparison analysis with the threshold value, it is determined that there is a voice activity and a signal indicating that there is a voice activity is outputted.

The method according to claim 2 or 3,
Wherein smoothing is performed on the ratio of the positive and negative signal powers in order to protect the voice activity period in calculating the ratio of the positive and negative signal powers.

The method according to claim 1,
Wherein the first microphone is located at a lower end of the communication device and the second microphone is located at an upper end of the communication device.

A communication device comprising a first microphone and a second microphone relatively far from the mouth of the speaker as compared to the first microphone,
The communication device according to any one of claims 1 to 6, further comprising a voice activity detector for determining the presence or absence of voice activity by the voice activity detection method of any one of claims 1 to 6.