[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

KR20150048471A - Method for voice activity detection and communication device implementing the same - Google Patents

Method for voice activity detection and communication device implementing the same Download PDF

Info

Publication number
KR20150048471A
KR20150048471A KR1020130128584A KR20130128584A KR20150048471A KR 20150048471 A KR20150048471 A KR 20150048471A KR 1020130128584 A KR1020130128584 A KR 1020130128584A KR 20130128584 A KR20130128584 A KR 20130128584A KR 20150048471 A KR20150048471 A KR 20150048471A
Authority
KR
South Korea
Prior art keywords
microphone
voice activity
signal power
communication device
ratio
Prior art date
Application number
KR1020130128584A
Other languages
Korean (ko)
Inventor
문석용
조정권
Original Assignee
(주)인피니티텔레콤
주식회사 시그테크
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by (주)인피니티텔레콤, 주식회사 시그테크 filed Critical (주)인피니티텔레콤
Priority to KR1020130128584A priority Critical patent/KR20150048471A/en
Publication of KR20150048471A publication Critical patent/KR20150048471A/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • G10L25/84Detection of presence or absence of voice signals for discriminating voice from noise
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R1/00Details of transducers, loudspeakers or microphones
    • H04R1/20Arrangements for obtaining desired frequency or directional characteristics
    • H04R1/22Arrangements for obtaining desired frequency or directional characteristics for obtaining desired frequency characteristic only 
    • H04R1/222Arrangements for obtaining desired frequency or directional characteristics for obtaining desired frequency characteristic only  for microphones
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/225Feedback of the input speech

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Otolaryngology (AREA)
  • Telephone Function (AREA)

Abstract

The present invention relates to a voice activity detection method for estimating and removing noise in a communication apparatus using two microphones and a communication apparatus employing the method. More particularly, the present invention relates to a voice activity detection method A method for detecting a voice activity in a communication device including a second microphone relatively far from a mouth, the method comprising: calculating a frequency component of an input signal input from the first microphone and the second microphone through a Fourier transform; Calculating a frequency signal power through a frequency component of each microphone and calculating a ratio of both signal powers by comparing and analyzing the frequency signal power of the first microphone and the frequency signal power of the second microphone; And determining the presence or absence of a voice activity based on the calculated ratio of the positive and negative signal powers. According to the present invention, the presence or absence of the voice activity is judged through the ratio value of the signal power from the two microphones, so that it is less affected by the magnitude of the signal coming in the two microphones.

Description

TECHNICAL FIELD [0001] The present invention relates to a voice activity detection method and a communication device employing the method,

BACKGROUND OF THE INVENTION 1. Field of the Invention [0001] The present invention relates to a voice activity detection (VAD) method and a communication apparatus employing the method, and more particularly to a voice activity detection method for estimating and removing noise in a communication device using two microphones, And a communication apparatus adopting the method.

A voice activity detection (VAD) method using one microphone is widely used to reduce background noise in a communication device such as a mobile phone and to increase the channel capacity or reduce battery consumption have.

For example, in some code division multiple access (CDMA) systems, VAD is used to minimize the effective radio spectrum used, thereby providing more system capacity. In addition, GSM communication systems are using VADs to reduce common-channel interference and reduce battery consumption in subscriber units.

However, these typical single-microphone VAD systems analyze acoustic information received by a single microphone, which limits its capacity greatly. Specifically, where the signals have a low signal-to-noise ratio (SNR), and where background noise changes rapidly, performance limits of single-microphone VAD systems are revealed.

In order to solve this problem, there have been proposed devices for removing noise using two microphones. For example, Korean Patent Publication No. 10-2004-0101373 discloses a communication device comprising one omni-directional microphone, a one-way microphone, and one or more skin surface microphone sensors in contact with the user's skin, And a voice activity sensor for outputting a control signal by processing a voice sensor activity signal of a skin surface microphone.

However, such a configuration is difficult to apply generally because the hardware configuration is complicated and the algorithm is also dependent on such hardware configuration.

Korean Patent Publication No. 10-2004-0101373

SUMMARY OF THE INVENTION It is an object of the present invention to provide a voice activity detection method which is generally applicable to a communication apparatus using two microphones and a communication apparatus employing the method.

According to another aspect of the present invention, there is provided a method for detecting a voice activity in a communication device including a first microphone and a second microphone relatively far from the mouth of a speaker as compared to the first microphone, Calculating a frequency component of an input signal input from the first microphone and the second microphone through a Fourier transform; Calculating a frequency signal power through a frequency component of each microphone and calculating a ratio of both signal powers by comparing and analyzing the frequency signal power of the first microphone and the frequency signal power of the second microphone; And determining the presence or absence of a voice activity based on the calculated ratio of the positive and negative signal powers.

As an example, when the frequency signal power of the second microphone is relatively larger than the frequency signal power of the first microphone, the ratio A (1) of the positive signal power in the 1 <

Figure pat00001
And if A (l) is relatively larger than the value between the thresholds 0 and 1 in the 1 < th > frame, it is estimated as a noise period.

As another example, when the frequency signal power of the first microphone is relatively higher than the frequency signal power of the second microphone, the ratio B (l)

Figure pat00002
And if B (l) is relatively smaller than the threshold value between 0 and 1 in the 1 < th > frame, it is estimated as a noise period.

If the ratio of the two signal powers is not estimated as a noise period through comparison analysis with the threshold value, it is determined that there is a voice activity and a signal indicating that there is a voice activity is output.

Here, in calculating the ratio of both signal powers, smoothing can be performed on the ratio of the power of both signals in order to protect the voice activity interval.

Meanwhile, the communication apparatus according to the present invention is characterized by including a voice activity detector for determining the presence or absence of voice activity by the above-described voice activity detection method.

Therefore, the voice activity detection method and the communication apparatus adopting the method of the present invention are applied to a communication apparatus provided with two microphones close to each other. When the noise signal is relatively far from the mouth of the speaker, The noise signal is input with a signal power of almost the same level. On the contrary, a microphone close to a speaker's mouth receives a voice of a speaker with a signal power of a level higher than that of a microphone far away, It is possible to detect the voice activity more accurately.

In addition, according to the present invention, since voice activity is detected using only two microphones, there is an advantage that a voice activity sensing method that can be generally applied to a communication device using two microphones and that is efficient can be implemented.

1 is a block diagram showing an internal configuration of a communication device to which the voice activity detection method of the present invention is applied.
2 is a view showing an example of arrangement of a microphone and a speaker in a communication device having two microphones.
3 is a view for explaining an operation of a voice activity detection method according to a preferred embodiment of the present invention.
4 is a diagram showing an example of the frequency signal power in the l < th >

Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the drawings.

1 is a block diagram illustrating an internal configuration of a communication device to which the voice activity detection method of the present invention is applied.

As shown in FIG. 1, the voice activity detection method of the present invention is applied to a communication device including at least two microphones 11 and 12. The first microphone (main microphone) 11 is located close to the mouth of the speaker and the second microphone (sub microphone) 12 is located relatively far from the mouth of the speaker It is located off.

Preferably, the first microphone 11 is located at the lower end of the communication device, and the second microphone 12 is located at the upper end of the communication device.

The analog signals input to the respective microphones 11 and 12 are amplified to an appropriate size and converted into digital signals by analog to digital converters 13 and 14, To the voice activity detector (15).

The voice activity detector 15 compares and analyzes the frequency signal power of the first microphone 11 and the frequency signal power of the second microphone 12 to calculate a ratio of the positive and negative signal powers, It is possible to estimate a signal other than the voice activity in the communication device as noise and remove the noise.

The noise eliminator 16 removes ambient noise using a signal indicating the presence or absence of voice activity from the voice activity detector 15 and a digital signal obtained from each microphone 11 and 12, .

The vocoder 31 encodes the noise canceled signal from the noise removing unit 16 and transmits the encoded signal to the calling party through the communication interface 41 and decodes the voice of the calling party transmitted through the communication interface 41 do.

The voice signal to be decoded is converted into an analog signal through a digital-to-analog converter (22), then amplified to an appropriate level and outputted through the speaker (21), thereby making a conversation between the parties.

2 is a diagram showing an example of arrangement of a microphone and a speaker in a communication device having two microphones. As shown in FIG. 2, the first microphone 11 is located at the lower end of the communication device, which is near the mouth of the speaker. The second microphone 12 is located at the lower end of the communication device, For example, at the top of the communication device. The speaker 21 is located at the top of the communication device, which is a position near the ear of the speaker.

FIG. 3 is a diagram for explaining an operation of a voice activity detection method according to a preferred embodiment of the present invention, and FIG. 4 is a diagram illustrating an example of a frequency signal power in a l-th frame. The operation of FIG. 3 is preferably performed for each frame, but the present invention is not limited thereto.

First, a voice signal input from the two microphones 11 and 12 is converted into a digital signal by the analog-digital converters 13 and 14, and the digital signal is input to the voice activity detector 15.

The voice activity detector 15 performs a Fourier transform (FFT) on each voice signal so as to grasp a frequency component of the voice signal of the first microphone 11 and the voice signal of the second microphone 12, (S100).

If the input voice signal of the first microphone is d (n) and the input voice signal of the second microphone is x (n), d (k) by the Fourier transform of d (n) x (k) by the Fourier transform of n (n) is derived by the following equation (1).

Figure pat00003

Figure pat00004

Here, 'N' is the number of samples of the block in the previous predetermined period including the current sample in the Fourier transform, and 0? N? N-1, 0? K? N-1.

Then, the voice activity detector 15 calculates the frequency signal powers of the two microphones 11 and 12, and calculates the frequency signal powers of the k-th frequency components d (k) and x (k) Let D (k) and X (k) denote D (k) and X (k), respectively.

Figure pat00005

Figure pat00006

As described above, when two or more microphones 11 and 12 are located close to each other in a communication device including a smartphone, when the noise signal is relatively far from the mouth of the speaker, A noise signal is input with a signal power of almost the same level.

On the contrary, the voice of the caller is input to the first microphone 11 located near the mouth of the speaker with a frequency signal power of a level higher than that of the second microphone 12. [

In the present invention, it is possible to estimate the noise interval using this phenomenon.

In the case of D (k) < X (k) at the D (k) and X (k) values derived from Equation (2) (K) and X (k) of the two microphones 11 and 12 in the 1 < th > frame when the frequency signal power of the second microphone 12 is greater than the frequency signal power of the first microphone 11. [ k) is derived by the following equation (3).

Figure pat00007

Where N is the number of samples in one block and l is the frame index.

In general, the noise is input to the microphone at a relatively long distance from the mouth of the speaker, so that the power of the frequency signal of almost the same level can be input to the two microphones 11 and 12. The value of A (l) 1 ".

Accordingly, the voice activity detector 15 according to the present embodiment estimates the noise interval by comparing the ratio A (1) of D (k) and X (k) derived from Equation 3 with a threshold value (S310-1) when A (l) > Thr_A is satisfied in the lth frame (S400). Here, the threshold value Thr_A is a value between 0 and 1.

The threshold value Thr_A is an optimal set value obtained from the results of repeated experiments, and the present invention is not limited to this, and it is obvious that the threshold value Thr_A may vary depending on the structure of the communication device and the characteristics of the microphone.

(K) > X (k) in the values of D (k) and X (k) derived from Equation 2 as a reverse concept of the above- If the frequency signal power of the first microphone 11 is greater than the frequency signal power of the second microphone 12, the power signals D (k), D (k) of the two microphones 11, The ratio B (1) of X (k) is derived by the following equation (4).

Figure pat00008

Where N is the number of samples in one block and l is the frame index.

In the present embodiment, since the speaker's voice is input at a relatively large level to the first microphone 11 than the second microphone 12, B (l) at this time is a value of "1" A (l).

Therefore, B (l) in the 1 < th > frame can be estimated as a noise period when B (l) < Thr_B is satisfied (S310-2). (S400) Here, the threshold value Thr_B is a value between 0 and 1.

Here, the threshold value Thr_B is also an optimum set value obtained from the result of repeated experiments, and the present invention is not limited thereto, and it is obvious that the threshold value Thr_B may vary depending on the structure of the communication device and the characteristics of the microphone.

That is, as shown in FIG. 4, the value of B (1) indicates a large value in a period in which there is no voice activity and a relatively small value in a period in which a voice activity exists. So that the noise is removed.

Through the above-described two embodiments, the voice activity detector 15 distinguishes between the voice activity period and the noise period, and only the noise signal can be extracted and removed.

According to a preferred embodiment of the present invention, in order to protect the voice activity interval when estimating the noise interval by the voice activity detector 15, A (l) and B (l) and smoothing is performed on each of the first and second images.

For example, in the case of A (l)

A (1) = A 1 * A (1) + (1 - 留 1) * A (1 - 1)

A (1) =? 2 * A (1) + (1 -? 2) * A (1 - 1)

Here, 0 < alpha 1 < alpha 2 < 1.

When smoothing is performed as described above, A (1) increases slowly and decreases rapidly.

Conversely, in the case of B (l)

B (l) = β1 * B (l) + (1-β1) * B (l-1)

B (l) =? 2 * B (l) + (1 -? 2) * B (l? 1)

Here, 1> β1> β2> 0.

When smoothing is performed as described above, B (l) increases relatively faster than A (l) and slowly decreases.

Here, the smoothing method can be variously performed through known techniques and the like, and is not limited to the examples shown in the present invention.

Thereafter, if the ratio of both signal powers in the voice activity detector 15 is not estimated as a noise period through comparison analysis with the threshold value, it is determined that there is voice activity and a signal indicating that there is voice activity is output (S500 )

The signal and the voice activity signal estimated by the noise period are transmitted to the noise eliminator 16 and are inputted from the microphones 11 and 12 on the basis of the signal inputted to the noise controller 16, The noise is removed by selectively editing the signal.

It will be apparent to those skilled in the art that various modifications and variations can be made in the present invention without departing from the spirit or scope of the invention. Therefore, the technical scope of the present invention should not be limited to the contents described in the detailed description of the specification, but should be defined by the claims.

11: first microphone
12: second microphone
15: Voice Activity Detector
16: Noise canceling

Claims (7)

A method for detecting a voice activity in a communication device comprising a first microphone and a second microphone relatively far from the mouth of the speaker as compared to the first microphone,
Calculating a frequency component of an input signal input from the first microphone and the second microphone through a Fourier transform;
Calculating a frequency signal power through a frequency component of each microphone and calculating a ratio of both signal powers by comparing and analyzing the frequency signal power of the first microphone and the frequency signal power of the second microphone; And
And determining whether there is a voice activity based on the ratio of the calculated positive signal power.
The method according to claim 1,
If the frequency signal power of the second microphone is relatively larger than the frequency signal power of the first microphone, the ratio A (1) of the positive signal power in the 1 <
Figure pat00009
Lt; / RTI >
and if A (1) is relatively larger than a value between a threshold value of 0 and 1 in a 1 < th > frame, a noise interval is estimated.
The method according to claim 1,
When the frequency signal power of the first microphone is relatively higher than the frequency signal power of the second microphone, the ratio B (l)
Figure pat00010
Lt; / RTI >
if the B (l) is smaller than a value between the thresholds 0 and 1 in the 1 < th > frame, it is estimated to be a noise period.
The method according to claim 2 or 3,
And if the ratio of both signal powers is not estimated as a noise period through comparison analysis with the threshold value, it is determined that there is a voice activity and a signal indicating that there is a voice activity is outputted.
The method according to claim 2 or 3,
Wherein smoothing is performed on the ratio of the positive and negative signal powers in order to protect the voice activity period in calculating the ratio of the positive and negative signal powers.
The method according to claim 1,
Wherein the first microphone is located at a lower end of the communication device and the second microphone is located at an upper end of the communication device.
A communication device comprising a first microphone and a second microphone relatively far from the mouth of the speaker as compared to the first microphone,
The communication device according to any one of claims 1 to 6, further comprising a voice activity detector for determining the presence or absence of voice activity by the voice activity detection method of any one of claims 1 to 6.
KR1020130128584A 2013-10-28 2013-10-28 Method for voice activity detection and communication device implementing the same KR20150048471A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
KR1020130128584A KR20150048471A (en) 2013-10-28 2013-10-28 Method for voice activity detection and communication device implementing the same

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
KR1020130128584A KR20150048471A (en) 2013-10-28 2013-10-28 Method for voice activity detection and communication device implementing the same

Publications (1)

Publication Number Publication Date
KR20150048471A true KR20150048471A (en) 2015-05-07

Family

ID=53386960

Family Applications (1)

Application Number Title Priority Date Filing Date
KR1020130128584A KR20150048471A (en) 2013-10-28 2013-10-28 Method for voice activity detection and communication device implementing the same

Country Status (1)

Country Link
KR (1) KR20150048471A (en)

Similar Documents

Publication Publication Date Title
KR102313894B1 (en) Method and apparatus for wind noise detection
US6782363B2 (en) Method and apparatus for performing real-time endpoint detection in automatic speech recognition
US9467779B2 (en) Microphone partial occlusion detector
JP5952434B2 (en) Speech enhancement method and apparatus applied to mobile phone
US9264804B2 (en) Noise suppressing method and a noise suppressor for applying the noise suppressing method
KR101852892B1 (en) Voice recognition method, voice recognition device, and electronic device
KR101210313B1 (en) System and method for utilizing inter?microphone level differences for speech enhancement
KR101444100B1 (en) Noise cancelling method and apparatus from the mixed sound
US9100756B2 (en) Microphone occlusion detector
JP6156012B2 (en) Voice processing apparatus and computer program for voice processing
KR20130085421A (en) Systems, methods, and apparatus for voice activity detection
KR20120080409A (en) Apparatus and method for estimating noise level by noise section discrimination
JP5870476B2 (en) Noise estimation device, noise estimation method, and noise estimation program
KR20110090940A (en) Audio source proximity estimation using sensor array for noise reduction
EP2245865A1 (en) Signaling microphone covering to the user
JP4816711B2 (en) Call voice processing apparatus and call voice processing method
JP2007003702A (en) Noise eliminator, communication terminal, and noise eliminating method
KR101961998B1 (en) Reducing instantaneous wind noise
US11375066B2 (en) Echo suppression device, echo suppression method, and echo suppression program
KR101396873B1 (en) Method and apparatus for noise reduction in a communication device having two microphones
US11664003B2 (en) Method for reducing noise, storage medium, chip and electronic equipment
CN110556128B (en) Voice activity detection method and device and computer readable storage medium
KR101451844B1 (en) Method for voice activity detection and communication device implementing the same
CN106997768A (en) A kind of computational methods, device and the electronic equipment of voice probability of occurrence
KR20150048471A (en) Method for voice activity detection and communication device implementing the same

Legal Events

Date Code Title Description
A201 Request for examination
E902 Notification of reason for refusal
E601 Decision to refuse application