[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

KR102012522B1 - Apparatus for processing directional sound - Google Patents

Apparatus for processing directional sound Download PDF

Info

Publication number
KR102012522B1
KR102012522B1 KR1020130037586A KR20130037586A KR102012522B1 KR 102012522 B1 KR102012522 B1 KR 102012522B1 KR 1020130037586 A KR1020130037586 A KR 1020130037586A KR 20130037586 A KR20130037586 A KR 20130037586A KR 102012522 B1 KR102012522 B1 KR 102012522B1
Authority
KR
South Korea
Prior art keywords
input
sound
microphones
signal
coherence
Prior art date
Application number
KR1020130037586A
Other languages
Korean (ko)
Other versions
KR20140121168A (en
Inventor
고한석
윤종성
금민석
박진수
문성규
Original Assignee
고려대학교 산학협력단
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 고려대학교 산학협력단 filed Critical 고려대학교 산학협력단
Priority to KR1020130037586A priority Critical patent/KR102012522B1/en
Publication of KR20140121168A publication Critical patent/KR20140121168A/en
Application granted granted Critical
Publication of KR102012522B1 publication Critical patent/KR102012522B1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • G10L25/84Detection of presence or absence of voice signals for discriminating voice from noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02166Microphone arrays; Beamforming

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

 The present invention discloses a directional sound processing apparatus. According to the present invention, a plurality of microphones for receiving ambient sound; A coherence calculator for calculating coherence for each frequency band of the sound signals input from the plurality of microphones; An estimator estimating the existence probability of the speech signal based on the acoustic signal input from each of the plurality of microphones; A weight calculator configured to assign different weights to respective frequency bands according to the estimated probability of existence of the speech signal, and output a result value by reflecting the weighted weights to the coherence; And a controller which compares the result value with a preset threshold and determines whether to process the plurality of micro-input sound signals.

Description

Directional acoustic signal processing device {APPARATUS FOR PROCESSING DIRECTIONAL SOUND}

 The present invention relates to an apparatus for processing directional acoustic signals, and more particularly, to an apparatus for selectively acquiring and processing a sound in a desired direction in a noise environment.

The microphone refers to a device that receives sound waves or ultrasonic waves and generates electric signals according to the vibrations.

Recently, with the development of robot-related technology, a microphone is used as an interface for communication between a robot and a user, and the robot recognizes a user's sound by converting a sound signal input through the microphone into an electrical signal.

This recognition process is always accompanied by noise (noise) generated in the surrounding environment of the microphone, and in order to recognize more various commands accurately and improve user convenience, it is essential to receive only sound signals except noise.

In addition, a large number of devices, such as a mobile communication terminal and a navigation device, have a user's voice recognition function, and in such a device, it is essential to process only a voice signal in a desired direction for an operation based on accurate voice recognition.

A number of conventional sound equipment companies have proposed an apparatus for blocking ambient noise by applying active noise canceling technology.

Conventional devices, however, are used for the purpose of creating a quiet environment that eliminates all ambient noise.

In addition, although the conventional technology of obtaining only a sound coming from a specific direction by using a multi-channel microphone has been applied to headphones and hearing aids, there is a limitation in utilization because it only accepts sound in a direction in which a person's eyes are directed.

More specifically, conventional noise canceling methods include a method using a generalized sidelobe canceller (GSC) and a method using a phase difference between signals input to two microphones.

The former method has a problem of output divergence or slowing down of convergence due to adaptive filter convergence problem in the GSC.

In addition, the latter method does not have a convergence problem because it does not use an adaptive filter. However, due to the limitation of using only phase information between two microphones, the system reacts sensitively to small changes in phase so that a temporary error occurs frequently. .

Korean Patent Laid-Open No. 2008-0000478 (Invention name: Method and apparatus for removing noise of signals input by a plurality of microphones in a portable terminal) is a method and apparatus for removing noise from a phase difference signal input to the portable terminal It is starting.

However, as described above, the conventional patent uses only the phase difference to remove the directional signal, so that the stability of the system is low, so that the noise cannot be effectively removed from the voice signal.

In order to solve the above problems of the prior art, the present invention is to propose a directional acoustic signal processing apparatus that can obtain only the sound of a specific direction desired by the user.

In order to solve the above technical problem, according to an embodiment of the present invention, a directional acoustic signal processing device, a plurality of microphones for receiving the ambient sound; A coherence calculator for calculating coherence for each frequency band of the sound signals input from the plurality of microphones; An estimator estimating the existence probability of the speech signal based on the acoustic signal input from each of the plurality of microphones; A weight calculator configured to assign different weights to respective frequency bands according to the estimated probability of existence of the speech signal, and output a result value by reflecting the weighted weights to the coherence; And a controller which compares the result value with a preset threshold and determines whether to process the plurality of micro-input sound signals.

An interface unit for receiving an angle at which the user wants to listen; And a phase converting unit converting phases of sound signals input from the plurality of microphones according to a listening angle input through the interface unit.

The estimator may include a background noise estimator for estimating background noise through an acoustic signal input from each of the plurality of microphones; And a signal existence probability estimator estimating a speech signal existence probability of each frequency band based on the background noise.

The weight calculator may assign a weight to 1 when the signal existence probability is equal to or greater than a preset value, and may assign the signal existence probability as a weight when less than the preset value.

The result value may be a value obtained by adding up a value obtained by multiplying the coherence for each frequency band by a weight.

The apparatus may further include a sound output selection unit configured to select the plurality of micro-input signals to be output to at least one of a speaker, a voice recognition unit, and a transmission unit according to the determination of the controller.

According to the present invention, since the background noise is estimated and the signal existence probability is determined to determine whether to process the acoustic signal, only the voice of the actual user except the noise can be selectively processed.

1 is a diagram showing the configuration of a directional sound processing apparatus according to an embodiment of the present invention.
2 is a diagram showing a detailed configuration of the spatial acoustic activity detection unit according to the present invention.
3 is a diagram illustrating an on / off process of the sound output selection unit according to the present invention;
4 illustrates acoustic attenuation and processing conditions in accordance with the present invention.

As the present invention allows for various changes and numerous embodiments, particular embodiments will be illustrated in the drawings and described in detail in the written description. However, this is not intended to limit the present invention to specific embodiments, it should be understood to include all modifications, equivalents, and substitutes included in the spirit and scope of the present invention. In describing the drawings, similar reference numerals are used for similar elements.

Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings. In the following description of the present invention, the same reference numerals will be used for the same means regardless of the reference numerals in order to facilitate the overall understanding.

1 is a block diagram of a directional sound processing apparatus according to a preferred embodiment of the present invention.

As shown in FIG. 1, the directional sound processing apparatus according to the present invention includes a plurality of microphones 100-1 to 100-n, a spatial voice activity detection unit 102, an interface unit 104, and the like. Sound output selection unit 106 may be included.

As shown in FIG. 1, in the directional sound processing apparatus according to the present invention, sounds from different directions, such as front and side sounds, are input through a plurality of microphones 100.

The spatial acoustic activity detecting unit 102 determines whether to output the sound input through the microphone 100 to the speaker or transmit the signal to the counterpart terminal according to a preset algorithm.

Preferably, the spatial acoustic activity detection unit 102 according to the present invention estimates the background noise, estimates the voice signal presence probability according to the estimated background noise, and finally inputs the microphone through the microphone based on the estimated signal presence probability. Determine whether to process the generated signal.

At this time, the spatial acoustic activity detection unit 102 determines whether to process the signal of the input sound at a predetermined time interval (for example, 20 ms or 30 ms interval).

Herein, the processing of the signal input through the microphone may include outputting the input sound signal through a speaker, whether the device is a voice recognition device, whether to output the voice recognition processing unit (not shown), or when the device is a call device. Determining whether to deliver to the counterpart device.

Preferably, the sound output selection unit 106 is connected to a speaker, a voice recognition processing unit, or a transmission unit, and it is determined whether to output sound according to the determination of the spatial sound activity detection unit 102.

2 is a view showing a detailed configuration of the spatial acoustic activity detection unit according to the present invention.

As shown in FIG. 2, the spatial acoustic activity detection unit 102 according to the present invention includes an FFT unit 200, a phase shifter 202, a coherence calculator 204, an estimator 206, and a weight. The calculator 208 and the controller 210 may be included.

In FIG. 2, the first microphone 100-1 to the second microphone 100-2 are provided for convenience of description, but it should be understood by those skilled in the art that the present invention is not limited thereto.

The sound signal input through the first microphone 100-1

Figure 112013029942228-pat00001
, The sound signal input through the second microphone (100-2)
Figure 112013029942228-pat00002
Assume

Here, t is time (frame) and w means frequency band index.

The FFT unit 200 converts the sound signals input through the first and second microphones 100-1 to 100-2 into the frequency domain. Preferably, the FFT unit 200 performs a Short Time Fast Fourier Transform (STFFT) to convert the sound signal input through the microphone into the frequency domain.

The phase converting unit 202 converts phase values for frequencies of the first microphone and the second microphone according to a listening angle preset by the user through the interface unit 104.

Here, the listening angle is defined as the angle that the user wants to hear. For example, assuming that the angle of the sound input from the front with respect to the microphones 100-1 and 100-2 is 0 °, the listening angle is generally Can be set to 0 °. However, in some cases, when the user wants to hear the sound input from the side, the listening angle may be set to, for example, 90 °, and the phase shifter 202 converts the phase of the sound input at the 90 ° angle. 0 °, i.e., the phase value can be converted as if input from the front.

After the phase shift, the coherence calculator 204 calculates coherence for each frequency band through Equation 1 below.

Figure 112013029942228-pat00003

here,

Figure 112013029942228-pat00004
Is frequency coherence,
Figure 112013029942228-pat00005
Is a cross correlation value of the first microphone 100-1 and the second microphone 100-2 inputs X and Y,
Figure 112013029942228-pat00006
Wow
Figure 112013029942228-pat00007
Is an auto correlation value.

According to an exemplary embodiment of the present invention, the estimator 206 estimates a voice signal presence probability based on signals input from each of the plurality of microphones.

In more detail, the estimator 206 may include a background noise estimator 220 and a signal existence probability estimator 222.

The background noise estimator 220 uses the Minima Controlled Recursive Averaging (MCRA) method to generate the background noise from the microphone input signal.

Figure 112013029942228-pat00008
Estimate

The signal existence probability estimator 222 calculates p (t, w), which is a speech existence probability of each frequency band, based on the estimated background noise component through Equation 2 below.

Figure 112013029942228-pat00009

here,

Figure 112013029942228-pat00010

Figure 112013029942228-pat00011
Prior Speech Absent Probability,

Figure 112013029942228-pat00012
,to be.

Based on p (t, w) calculated through Equation 2, the weight calculator 208 calculates the weight.

According to an embodiment of the present invention, if p (t, w) is 0.8 or greater, the weight

Figure 112013029942228-pat00013
To 1, and in other cases p (t, w)
Figure 112013029942228-pat00014
To give.

Herein, 0.8 is an experimentally determined value, and the value given to the weight 1 is not limited thereto and may be variously applied.

The weight calculator 208 multiplies the previously given coherence by the above-mentioned weight for each predetermined frequency band as shown in Equation 3 below.

Figure 112013029942228-pat00015

In addition, the weight calculator 208 adds the result of the weight multiplied by the coherence of each frequency band, as shown in Equation 4 below, and outputs the result to the controller 210.

Figure 112013029942228-pat00016

The controller 210 according to the present invention determines whether the result value output from the weight calculator 208 exceeds a preset threshold value as shown in Equation 5, and when the threshold value is exceeded, the voice is input at an angle desired by the user to listen. I think it is. That is, the controller 210 determines to process the currently input sound signal when the result value is greater than or equal to the threshold.

Figure 112013029942228-pat00017

On the other hand, if the result of the weight calculator 208 does not exceed the threshold, the controller 210 determines to attenuate the currently input sound signal.

According to the present invention, the controller 210 outputs a control signal according to the determination result to the sound output selector 106.

As shown in FIG. 3, the sound output selector 106 selects whether to output the sound signal input through the microphone according to the on / off control signal of the controller 210.

Preferably, the sound output selector 106 may be implemented as a switching element.

In more detail, as illustrated in FIG. 3B, the controller 210 controls the sound signal input from the microphone to be input to the speaker, the voice recognition unit, or the transmitter only when the result value of the weight calculator 208 exceeds the threshold.

As shown in FIG. 4, when sound is input in a direction not desired by the user, such a signal is attenuated and outputs only sound input in the direction desired by the user, and thus can be efficiently used in a noisy environment.

Embodiments of the present invention may be implemented in the form of program instructions that can be executed by various computer means and recorded in a computer readable medium. The computer readable medium may include program instructions, data files, data structures, etc. alone or in combination. Program instructions recorded on the media may be those specially designed and constructed for the purposes of the present invention, or they may be of the kind well-known and available to those having skill in the computer software arts. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tape, optical media such as CD-ROMs, DVDs, and magnetic disks, such as floppy disks. Examples of program instructions such as magneto-optical, ROM, RAM, flash memory, etc. may be executed by a computer using an interpreter as well as machine code such as produced by a compiler. Contains high-level language codes. The hardware device described above may be configured to operate as at least one software module to perform the operations of one embodiment of the present invention, and vice versa.

Preferred embodiments of the present invention described above are disclosed for purposes of illustration, and those skilled in the art will be able to make various modifications, changes, and additions within the spirit and scope of the present invention. Additions should be considered to be within the scope of the following claims.

Claims (6)

A directional acoustic signal processing device,
A plurality of microphones for receiving ambient sound; And
A coherence calculator for calculating coherence for each frequency band of the sound signals input from the plurality of microphones;
An estimator estimating the existence probability of the speech signal based on the acoustic signals input from the plurality of microphones;
A weight calculator configured to assign different weights to respective frequency bands according to the estimated probability of existence of the speech signal, and output a result value by reflecting the weighted weights to the coherence;
A control unit which determines whether to process the plurality of micro-input sound signals by comparing the result value with a preset threshold value; And
And a sound output selector configured to select the plurality of micro-input signals to be output to at least one of a speaker, a voice recognizer, and a transmitter according to a determination of the controller.
The method of claim 1,
An interface unit for receiving an angle at which the user wants to listen; And
And a phase shifter for converting phases of sound signals input from the plurality of microphones according to a listening angle input through the interface unit.
The method of claim 1,
The estimating unit,
A background noise estimator for estimating background noise through an acoustic signal input from each of the plurality of microphones; And
And a signal existence probability estimator for estimating a speech signal existence probability of each frequency band based on the background noise.
The method of claim 1,
The weight calculation unit,
And a weight of 1 when the signal existence probability is equal to or greater than a preset value, and a weight of the signal existence probability as a weight when less than the preset value.
The method of claim 1,
The result value is a directional sound signal processing apparatus is a sum of the sum of the product of the weighted multiplied coherence for each frequency band.
delete
KR1020130037586A 2013-04-05 2013-04-05 Apparatus for processing directional sound KR102012522B1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
KR1020130037586A KR102012522B1 (en) 2013-04-05 2013-04-05 Apparatus for processing directional sound

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
KR1020130037586A KR102012522B1 (en) 2013-04-05 2013-04-05 Apparatus for processing directional sound

Publications (2)

Publication Number Publication Date
KR20140121168A KR20140121168A (en) 2014-10-15
KR102012522B1 true KR102012522B1 (en) 2019-08-20

Family

ID=51992804

Family Applications (1)

Application Number Title Priority Date Filing Date
KR1020130037586A KR102012522B1 (en) 2013-04-05 2013-04-05 Apparatus for processing directional sound

Country Status (1)

Country Link
KR (1) KR102012522B1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111833901B (en) * 2019-04-23 2024-04-05 北京京东尚科信息技术有限公司 Audio processing method, audio processing device, system and medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101009854B1 (en) * 2007-03-22 2011-01-19 고려대학교 산학협력단 Method and apparatus for estimating noise using harmonics of speech
KR100943224B1 (en) * 2007-10-16 2010-02-18 한국전자통신연구원 An intelligent robot for localizing sound source by frequency-domain characteristics and method thereof

Also Published As

Publication number Publication date
KR20140121168A (en) 2014-10-15

Similar Documents

Publication Publication Date Title
KR101210313B1 (en) System and method for utilizing inter?microphone level differences for speech enhancement
KR101444100B1 (en) Noise cancelling method and apparatus from the mixed sound
US10580428B2 (en) Audio noise estimation and filtering
KR101449433B1 (en) Noise cancelling method and apparatus from the sound signal through the microphone
US9264804B2 (en) Noise suppressing method and a noise suppressor for applying the noise suppressing method
US20170140771A1 (en) Information processing apparatus, information processing method, and computer program product
US8611552B1 (en) Direction-aware active noise cancellation system
KR101475864B1 (en) Apparatus and method for eliminating noise
US20170092256A1 (en) Adaptive block matrix using pre-whitening for adaptive beam forming
CN104158990A (en) Method for processing an audio signal and audio receiving circuit
KR102076760B1 (en) Method for cancellating nonlinear acoustic echo based on kalman filtering using microphone array
US9185506B1 (en) Comfort noise generation based on noise estimation
JP6833616B2 (en) Echo suppression device, echo suppression method and echo suppression program
US9330677B2 (en) Method and apparatus for generating a noise reduced audio signal using a microphone array
US20200286501A1 (en) Apparatus and a method for signal enhancement
US9485572B2 (en) Sound processing device, sound processing method, and program
KR20160014709A (en) Echo suppression
CN111383647A (en) Voice signal processing method and device and readable storage medium
CN109215672B (en) Method, device and equipment for processing sound information
US11205437B1 (en) Acoustic echo cancellation control
KR102012522B1 (en) Apparatus for processing directional sound
KR100949910B1 (en) Method and apparatus for acoustic echo cancellation using spectral subtraction
KR102063824B1 (en) Apparatus and Method for Cancelling Acoustic Feedback in Hearing Aids
CN113824843B (en) Voice call quality detection method, device, equipment and storage medium
US10692514B2 (en) Single channel noise reduction

Legal Events

Date Code Title Description
A201 Request for examination
E902 Notification of reason for refusal
E701 Decision to grant or registration of patent right
GRNT Written decision to grant