KR102012522B1

KR102012522B1 - Apparatus for processing directional sound

Info

Publication number: KR102012522B1
Application number: KR1020130037586A
Authority: KR
Inventors: 고한석; 윤종성; 금민석; 박진수; 문성규
Original assignee: 고려대학교 산학협력단
Priority date: 2013-04-05
Filing date: 2013-04-05
Publication date: 2019-08-20
Also published as: KR20140121168A

Abstract

The present invention discloses a directional sound processing apparatus. According to the present invention, a plurality of microphones for receiving ambient sound; A coherence calculator for calculating coherence for each frequency band of the sound signals input from the plurality of microphones; An estimator estimating the existence probability of the speech signal based on the acoustic signal input from each of the plurality of microphones; A weight calculator configured to assign different weights to respective frequency bands according to the estimated probability of existence of the speech signal, and output a result value by reflecting the weighted weights to the coherence; And a controller which compares the result value with a preset threshold and determines whether to process the plurality of micro-input sound signals.

Description

Directional acoustic signal processing device {APPARATUS FOR PROCESSING DIRECTIONAL SOUND}

The present invention relates to an apparatus for processing directional acoustic signals, and more particularly, to an apparatus for selectively acquiring and processing a sound in a desired direction in a noise environment.

The microphone refers to a device that receives sound waves or ultrasonic waves and generates electric signals according to the vibrations.

Recently, with the development of robot-related technology, a microphone is used as an interface for communication between a robot and a user, and the robot recognizes a user's sound by converting a sound signal input through the microphone into an electrical signal.

This recognition process is always accompanied by noise (noise) generated in the surrounding environment of the microphone, and in order to recognize more various commands accurately and improve user convenience, it is essential to receive only sound signals except noise.

In addition, a large number of devices, such as a mobile communication terminal and a navigation device, have a user's voice recognition function, and in such a device, it is essential to process only a voice signal in a desired direction for an operation based on accurate voice recognition.

A number of conventional sound equipment companies have proposed an apparatus for blocking ambient noise by applying active noise canceling technology.

Conventional devices, however, are used for the purpose of creating a quiet environment that eliminates all ambient noise.

In addition, although the conventional technology of obtaining only a sound coming from a specific direction by using a multi-channel microphone has been applied to headphones and hearing aids, there is a limitation in utilization because it only accepts sound in a direction in which a person's eyes are directed.

More specifically, conventional noise canceling methods include a method using a generalized sidelobe canceller (GSC) and a method using a phase difference between signals input to two microphones.

The former method has a problem of output divergence or slowing down of convergence due to adaptive filter convergence problem in the GSC.

In addition, the latter method does not have a convergence problem because it does not use an adaptive filter. However, due to the limitation of using only phase information between two microphones, the system reacts sensitively to small changes in phase so that a temporary error occurs frequently. .

Korean Patent Laid-Open No. 2008-0000478 (Invention name: Method and apparatus for removing noise of signals input by a plurality of microphones in a portable terminal) is a method and apparatus for removing noise from a phase difference signal input to the portable terminal It is starting.

However, as described above, the conventional patent uses only the phase difference to remove the directional signal, so that the stability of the system is low, so that the noise cannot be effectively removed from the voice signal.

In order to solve the above problems of the prior art, the present invention is to propose a directional acoustic signal processing apparatus that can obtain only the sound of a specific direction desired by the user.

In order to solve the above technical problem, according to an embodiment of the present invention, a directional acoustic signal processing device, a plurality of microphones for receiving the ambient sound; A coherence calculator for calculating coherence for each frequency band of the sound signals input from the plurality of microphones; An estimator estimating the existence probability of the speech signal based on the acoustic signal input from each of the plurality of microphones; A weight calculator configured to assign different weights to respective frequency bands according to the estimated probability of existence of the speech signal, and output a result value by reflecting the weighted weights to the coherence; And a controller which compares the result value with a preset threshold and determines whether to process the plurality of micro-input sound signals.

An interface unit for receiving an angle at which the user wants to listen; And a phase converting unit converting phases of sound signals input from the plurality of microphones according to a listening angle input through the interface unit.

The estimator may include a background noise estimator for estimating background noise through an acoustic signal input from each of the plurality of microphones; And a signal existence probability estimator estimating a speech signal existence probability of each frequency band based on the background noise.

The weight calculator may assign a weight to 1 when the signal existence probability is equal to or greater than a preset value, and may assign the signal existence probability as a weight when less than the preset value.

The result value may be a value obtained by adding up a value obtained by multiplying the coherence for each frequency band by a weight.

The apparatus may further include a sound output selection unit configured to select the plurality of micro-input signals to be output to at least one of a speaker, a voice recognition unit, and a transmission unit according to the determination of the controller.

According to the present invention, since the background noise is estimated and the signal existence probability is determined to determine whether to process the acoustic signal, only the voice of the actual user except the noise can be selectively processed.

1 is a diagram showing the configuration of a directional sound processing apparatus according to an embodiment of the present invention.
2 is a diagram showing a detailed configuration of the spatial acoustic activity detection unit according to the present invention.
3 is a diagram illustrating an on / off process of the sound output selection unit according to the present invention;
4 illustrates acoustic attenuation and processing conditions in accordance with the present invention.

As the present invention allows for various changes and numerous embodiments, particular embodiments will be illustrated in the drawings and described in detail in the written description. However, this is not intended to limit the present invention to specific embodiments, it should be understood to include all modifications, equivalents, and substitutes included in the spirit and scope of the present invention. In describing the drawings, similar reference numerals are used for similar elements.

Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings. In the following description of the present invention, the same reference numerals will be used for the same means regardless of the reference numerals in order to facilitate the overall understanding.

1 is a block diagram of a directional sound processing apparatus according to a preferred embodiment of the present invention.

As shown in FIG. 1, the directional sound processing apparatus according to the present invention includes a plurality of microphones 100-1 to 100-n, a spatial voice activity detection unit 102, an interface unit 104, and the like. Sound output selection unit 106 may be included.

As shown in FIG. 1, in the directional sound processing apparatus according to the present invention, sounds from different directions, such as front and side sounds, are input through a plurality of microphones 100.

The spatial acoustic activity detecting unit 102 determines whether to output the sound input through the microphone 100 to the speaker or transmit the signal to the counterpart terminal according to a preset algorithm.

Preferably, the spatial acoustic activity detection unit 102 according to the present invention estimates the background noise, estimates the voice signal presence probability according to the estimated background noise, and finally inputs the microphone through the microphone based on the estimated signal presence probability. Determine whether to process the generated signal.

At this time, the spatial acoustic activity detection unit 102 determines whether to process the signal of the input sound at a predetermined time interval (for example, 20 ms or 30 ms interval).

Herein, the processing of the signal input through the microphone may include outputting the input sound signal through a speaker, whether the device is a voice recognition device, whether to output the voice recognition processing unit (not shown), or when the device is a call device. Determining whether to deliver to the counterpart device.

Preferably, the sound output selection unit 106 is connected to a speaker, a voice recognition processing unit, or a transmission unit, and it is determined whether to output sound according to the determination of the spatial sound activity detection unit 102.

2 is a view showing a detailed configuration of the spatial acoustic activity detection unit according to the present invention.

As shown in FIG. 2, the spatial acoustic activity detection unit 102 according to the present invention includes an FFT unit 200, a phase shifter 202, a coherence calculator 204, an estimator 206, and a weight. The calculator 208 and the controller 210 may be included.

In FIG. 2, the first microphone 100-1 to the second microphone 100-2 are provided for convenience of description, but it should be understood by those skilled in the art that the present invention is not limited thereto.

The sound signal input through the first microphone 100-1

, The sound signal input through the second microphone (100-2)

Assume

Here, t is time (frame) and w means frequency band index.

The FFT unit 200 converts the sound signals input through the first and second microphones 100-1 to 100-2 into the frequency domain. Preferably, the FFT unit 200 performs a Short Time Fast Fourier Transform (STFFT) to convert the sound signal input through the microphone into the frequency domain.

The phase converting unit 202 converts phase values for frequencies of the first microphone and the second microphone according to a listening angle preset by the user through the interface unit 104.

Here, the listening angle is defined as the angle that the user wants to hear. For example, assuming that the angle of the sound input from the front with respect to the microphones 100-1 and 100-2 is 0 °, the listening angle is generally Can be set to 0 °. However, in some cases, when the user wants to hear the sound input from the side, the listening angle may be set to, for example, 90 °, and the phase shifter 202 converts the phase of the sound input at the 90 ° angle. 0 °, i.e., the phase value can be converted as if input from the front.

After the phase shift, the coherence calculator 204 calculates coherence for each frequency band through Equation 1 below.

here,

Is frequency coherence,

Is a cross correlation value of the first microphone 100-1 and the second microphone 100-2 inputs X and Y,

Wow

Is an auto correlation value.

According to an exemplary embodiment of the present invention, the estimator 206 estimates a voice signal presence probability based on signals input from each of the plurality of microphones.

In more detail, the estimator 206 may include a background noise estimator 220 and a signal existence probability estimator 222.

The background noise estimator 220 uses the Minima Controlled Recursive Averaging (MCRA) method to generate the background noise from the microphone input signal.

Estimate

The signal existence probability estimator 222 calculates p (t, w), which is a speech existence probability of each frequency band, based on the estimated background noise component through Equation 2 below.

here,

Prior Speech Absent Probability,

,to be.

Based on p (t, w) calculated through Equation 2, the weight calculator 208 calculates the weight.

According to an embodiment of the present invention, if p (t, w) is 0.8 or greater, the weight

To 1, and in other cases p (t, w)

To give.

Herein, 0.8 is an experimentally determined value, and the value given to the weight 1 is not limited thereto and may be variously applied.

The weight calculator 208 multiplies the previously given coherence by the above-mentioned weight for each predetermined frequency band as shown in Equation 3 below.

In addition, the weight calculator 208 adds the result of the weight multiplied by the coherence of each frequency band, as shown in Equation 4 below, and outputs the result to the controller 210.

The controller 210 according to the present invention determines whether the result value output from the weight calculator 208 exceeds a preset threshold value as shown in Equation 5, and when the threshold value is exceeded, the voice is input at an angle desired by the user to listen. I think it is. That is, the controller 210 determines to process the currently input sound signal when the result value is greater than or equal to the threshold.

On the other hand, if the result of the weight calculator 208 does not exceed the threshold, the controller 210 determines to attenuate the currently input sound signal.

According to the present invention, the controller 210 outputs a control signal according to the determination result to the sound output selector 106.

As shown in FIG. 3, the sound output selector 106 selects whether to output the sound signal input through the microphone according to the on / off control signal of the controller 210.

Preferably, the sound output selector 106 may be implemented as a switching element.

In more detail, as illustrated in FIG. 3B, the controller 210 controls the sound signal input from the microphone to be input to the speaker, the voice recognition unit, or the transmitter only when the result value of the weight calculator 208 exceeds the threshold.

As shown in FIG. 4, when sound is input in a direction not desired by the user, such a signal is attenuated and outputs only sound input in the direction desired by the user, and thus can be efficiently used in a noisy environment.

Embodiments of the present invention may be implemented in the form of program instructions that can be executed by various computer means and recorded in a computer readable medium. The computer readable medium may include program instructions, data files, data structures, etc. alone or in combination. Program instructions recorded on the media may be those specially designed and constructed for the purposes of the present invention, or they may be of the kind well-known and available to those having skill in the computer software arts. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tape, optical media such as CD-ROMs, DVDs, and magnetic disks, such as floppy disks. Examples of program instructions such as magneto-optical, ROM, RAM, flash memory, etc. may be executed by a computer using an interpreter as well as machine code such as produced by a compiler. Contains high-level language codes. The hardware device described above may be configured to operate as at least one software module to perform the operations of one embodiment of the present invention, and vice versa.

Preferred embodiments of the present invention described above are disclosed for purposes of illustration, and those skilled in the art will be able to make various modifications, changes, and additions within the spirit and scope of the present invention. Additions should be considered to be within the scope of the following claims.

Claims

A directional acoustic signal processing device,
A plurality of microphones for receiving ambient sound; And
A coherence calculator for calculating coherence for each frequency band of the sound signals input from the plurality of microphones;
An estimator estimating the existence probability of the speech signal based on the acoustic signals input from the plurality of microphones;
A weight calculator configured to assign different weights to respective frequency bands according to the estimated probability of existence of the speech signal, and output a result value by reflecting the weighted weights to the coherence;
A control unit which determines whether to process the plurality of micro-input sound signals by comparing the result value with a preset threshold value; And
And a sound output selector configured to select the plurality of micro-input signals to be output to at least one of a speaker, a voice recognizer, and a transmitter according to a determination of the controller.

The method of claim 1,
An interface unit for receiving an angle at which the user wants to listen; And
And a phase shifter for converting phases of sound signals input from the plurality of microphones according to a listening angle input through the interface unit.

The method of claim 1,
The estimating unit,
A background noise estimator for estimating background noise through an acoustic signal input from each of the plurality of microphones; And
And a signal existence probability estimator for estimating a speech signal existence probability of each frequency band based on the background noise.

The method of claim 1,
The weight calculation unit,
And a weight of 1 when the signal existence probability is equal to or greater than a preset value, and a weight of the signal existence probability as a weight when less than the preset value.

The method of claim 1,
The result value is a directional sound signal processing apparatus is a sum of the sum of the product of the weighted multiplied coherence for each frequency band.

delete