WO2012038998A1 - 雑音抑圧装置 - Google Patents
雑音抑圧装置 Download PDFInfo
- Publication number
- WO2012038998A1 WO2012038998A1 PCT/JP2010/005711 JP2010005711W WO2012038998A1 WO 2012038998 A1 WO2012038998 A1 WO 2012038998A1 JP 2010005711 W JP2010005711 W JP 2010005711W WO 2012038998 A1 WO2012038998 A1 WO 2012038998A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- noise
- power spectrum
- spectrum
- suppression
- unit
- Prior art date
Links
- 230000001629 suppression Effects 0.000 title claims abstract description 89
- 238000001228 spectrum Methods 0.000 claims abstract description 176
- 238000004364 calculation method Methods 0.000 claims abstract description 42
- 230000000737 periodic effect Effects 0.000 claims abstract description 14
- 238000006243 chemical reaction Methods 0.000 claims description 4
- 238000000034 method Methods 0.000 description 20
- 230000003595 spectral effect Effects 0.000 description 15
- 238000005311 autocorrelation function Methods 0.000 description 10
- 238000010586 diagram Methods 0.000 description 8
- 238000012545 processing Methods 0.000 description 8
- 230000005236 sound signal Effects 0.000 description 6
- 238000004458 analytical method Methods 0.000 description 5
- 238000012937 correction Methods 0.000 description 5
- 230000014509 gene expression Effects 0.000 description 5
- 238000004891 communication Methods 0.000 description 3
- 238000001514 detection method Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 230000001413 cellular effect Effects 0.000 description 2
- 230000006866 deterioration Effects 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 238000009499 grossing Methods 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- 238000009825 accumulation Methods 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 238000009795 derivation Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000011410 subtraction method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L2021/02085—Periodic noise
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02168—Noise filtering characterised by the method used for estimating noise the estimation exclusively taking place during speech pauses
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L21/0232—Processing in the frequency domain
Definitions
- the present invention is an audio communication / sound accumulation / recognition system introduced in a voice communication system such as a car navigation system, a cellular phone, and an interphone, a hands-free call system, a TV conference system, a monitoring system, etc.
- the present invention relates to a noise suppression device that is used to improve the recognition rate of a system and suppresses background noise mixed in an input signal.
- a time domain input signal is converted into a power spectrum which is a frequency domain signal, and noise suppression is performed using the power spectrum of the input signal and an estimated noise spectrum separately estimated from the input signal.
- the amount of suppression for the input signal is calculated, the amplitude of the power spectrum of the input signal is suppressed using the obtained amount of suppression, and the noise-suppressed signal is converted by converting the amplitude-suppressed power spectrum and the phase spectrum of the input signal into the time domain.
- the suppression amount is calculated based on the ratio (S / N ratio) between the speech power spectrum and the estimated noise power spectrum, but when the value becomes negative (in decibel values), the suppression amount is correct. Cannot be calculated. For example, in an audio signal in which automobile driving noise having a large power is superimposed on a low frequency, the low frequency of the audio is buried in the noise, so the SN ratio becomes negative. As a result, the low frequency of the audio signal is excessive. There is a problem of sound quality degradation due to suppression.
- Patent Document 1 extracts a part of the harmonic component of the fundamental frequency (pitch) signal of the audio from the input signal and extracts it.
- An audio signal processing apparatus that generates a subharmonic component by squaring the harmonic component thus generated and obtains an audio signal with improved sound quality by superimposing the obtained subharmonic component on an input signal is disclosed.
- the present invention has been made to solve the above-described problems, and an object of the present invention is to provide a high-quality noise suppression device with simple processing.
- a noise suppression apparatus includes a power spectrum calculation unit that converts a time domain input signal into a power spectrum that is a frequency domain signal, and a voice / noise determination unit that determines whether the power spectrum is voice or noise.
- a noise spectrum estimator for estimating the noise spectrum of the power spectrum based on the determination result of the voice / noise determination unit; and a periodic component estimation for analyzing the harmonic structure constituting the power spectrum and estimating periodic information of the power spectrum
- a weighting factor calculation unit for calculating a weighting factor for weighting the power spectrum based on the periodicity information, the determination result of the voice / noise determination unit, and the signal information of the power spectrum, and the power spectrum, voice / noise Suppresses noise contained in the power spectrum based on the judgment result and weighting coefficient
- a suppression coefficient calculation unit for calculating a suppression coefficient for the signal, a spectrum suppression unit for suppressing the amplitude of the power spectrum using the suppression coefficient, and a noise suppression signal by converting the power spectrum amplitude-suppressed in the spectrum
- the harmonic structure constituting the power spectrum is analyzed, the periodic component estimation unit that estimates the periodicity information of the power spectrum, the periodicity information, the determination result of the voice / noise determination unit, and the power spectrum Based on the signal information, a weighting coefficient calculation unit that calculates a weighting coefficient for weighting the power spectrum, and suppresses noise included in the power spectrum based on the determination result of the power spectrum, the voice / noise determination unit, and the weighting coefficient.
- a suppression coefficient calculation unit that calculates a suppression coefficient for the purpose and a spectrum suppression unit that suppresses the amplitude of the power spectrum using the suppression coefficient. It can be corrected to preserve the wave structure, suppress excessive sound suppression, Suppression can be carried out.
- FIG. 1 is a block diagram illustrating a configuration of a noise suppression device according to Embodiment 1.
- FIG. 6 is an explanatory diagram schematically showing detection of a harmonic structure of speech in a periodic component estimation unit of the noise suppression device according to Embodiment 1.
- FIG. 6 is an explanatory diagram schematically showing harmonic structure correction of speech in a periodic component estimation unit of the noise suppression device according to Embodiment 1.
- FIG. 6 is an explanatory diagram schematically showing a state of an a priori SNR when using a weighted posterior SNR in an S / N ratio calculation unit of the noise suppression apparatus according to Embodiment 1.
- FIG. 6 is a diagram illustrating an example of an output result of the noise suppression device according to Embodiment 1.
- FIG. FIG. 10 is a block diagram illustrating a configuration of a noise suppression device according to a fourth embodiment.
- FIG. 1 is a block diagram showing a configuration of a noise suppression apparatus according to Embodiment 1 of the present invention.
- the noise suppression apparatus 100 includes an input terminal 1, a Fourier transform unit 2, a power spectrum calculation unit 3, a periodic component estimation unit 4, a speech / noise section determination unit (speech / noise determination unit) 5, a noise spectrum estimation unit 6, a weighting factor.
- the calculation unit 7 includes an SN ratio calculation unit (suppression coefficient calculation unit) 8, a suppression amount calculation unit 9, a spectrum suppression unit 10, an inverse Fourier transform unit (conversion unit) 11, and an output terminal 12.
- voice or music captured through a microphone is A / D (analog / digital) converted, then sampled at a predetermined sampling frequency (for example, 8 kHz) and divided into frames. (For example, 10 ms) and input to the noise suppression apparatus 100 via the input terminal 1.
- a predetermined sampling frequency for example, 8 kHz
- the Fourier transform unit 2 performs, for example, Hanning windowing on the input signal, and then performs a fast Fourier transform of 256 points, for example, as in the following equation (1), and the spectral component X ( ⁇ , k ).
- ⁇ is a frame number when the input signal is divided into frames
- k is a number that designates a frequency component in the frequency band of the power spectrum (hereinafter referred to as a spectrum number)
- FT [ ⁇ ] represents a Fourier transform process.
- the power spectrum calculation unit 3 obtains a power spectrum Y ( ⁇ , k) from the spectrum component of the input signal using the following equation (2).
- Re ⁇ X ( ⁇ , k) ⁇ and Im ⁇ X ( ⁇ , k) ⁇ denote a real part and an imaginary part of the input signal spectrum after Fourier transform, respectively.
- the periodic component estimation unit 4 receives the power spectrum Y ( ⁇ , k) output from the power spectrum calculation unit 3 and analyzes the harmonic structure of the input signal spectrum. As shown in FIG. 2, the analysis of the harmonic structure is performed by detecting a peak of the harmonic structure (hereinafter referred to as a spectrum peak) formed by the power spectrum. Specifically, in order to remove a minute peak component unrelated to the harmonic structure, for example, after subtracting 20% of the maximum value of the power spectrum from each power spectrum component, the spectrum envelope of the power spectrum in order from the lower range The maximum value of is tracked.
- the power spectrum example in FIG. 2 describes the voice spectrum and the noise spectrum as separate components for ease of explanation, but the actual input signal has the noise spectrum superimposed (added) on the voice spectrum. The peak of the voice spectrum whose power is smaller than that of the noise spectrum cannot be observed.
- all spectrum peaks are extracted, but may be limited to a specific frequency band such as only a band with a good SN ratio.
- the peak of the speech spectrum buried in the noise spectrum is estimated.
- “1” may not be set in the periodicity information p ( ⁇ , k) in that band. The same can be done even in an extremely high frequency band.
- Equation (3) is a Wiener-Khintchin theorem and will not be described.
- Equation (4) the maximum value ⁇ max ( ⁇ ) of the normalized autocorrelation function is obtained using Equation (4).
- the obtained periodicity information p ( ⁇ , ⁇ ) and the autocorrelation function maximum value ⁇ max ( ⁇ ) are output.
- a known method such as cepstrum analysis can be used in addition to the power spectrum peak analysis and autocorrelation function method.
- the voice / noise section determination unit 5 includes a power spectrum Y ( ⁇ , k) output from the power spectrum calculation unit 3, an autocorrelation function maximum value ⁇ max ( ⁇ ) output from the periodic component estimation unit 4, and noise described later.
- the estimated noise spectrum N ( ⁇ , k) output from the spectrum estimation unit 6 is input, it is determined whether the input signal of the current frame is speech or noise, and the result is output as a determination flag.
- the determination flag Vflag is set to “1 (voice)” as being voice. In other cases, the determination flag Vflag is set to “0 (noise)” and output as noise.
- N ( ⁇ , k) is an estimated noise spectrum
- S pow and N pow represent the sum of the power spectrum of the input signal and the sum of the estimated noise spectrum, respectively.
- the noise spectrum estimation unit 6 inputs the power spectrum Y ( ⁇ , k) output from the power spectrum calculation unit 3 and the determination flag Vflag output from the speech / noise section determination unit 5, and the following equation (7)
- the noise spectrum is estimated and updated according to the determination flag Vflag, and the estimated noise spectrum N ( ⁇ , k) is output.
- N ( ⁇ 1, k) is an estimated noise spectrum in the previous frame, and is held in a storage unit such as a RAM (Random Access Memory) in the noise spectrum estimation unit 6.
- the determination flag Vflag 0 since the input signal of the current frame is determined to be noise, the power spectrum Y ( ⁇ , k) of the input signal and the update coefficient ⁇ are used.
- the estimated noise spectrum N ( ⁇ -1, k) of the previous frame is updated.
- the determination flag Vflag 1
- the input signal of the current frame is speech
- the estimated noise spectrum N ( ⁇ 1, k) of the previous frame is directly used as the estimated noise spectrum N ( ⁇ , k) of the current frame. ).
- the weighting factor calculation unit 7 outputs the periodicity information p ( ⁇ , k) output from the periodic component estimation unit 4, the determination flag Vflag output from the speech / noise section determination unit 5, and the SN ratio calculation unit 8 described later.
- the S / N ratio (signal-to-noise ratio) for each spectral component to be input is input, and a weighting factor W ( ⁇ , k) for weighting each spectral component is calculated for the S / N ratio.
- W ( ⁇ -1, k) is a weighting factor of the previous frame
- ⁇ is a predetermined constant for smoothing
- ⁇ 0.8 is preferable.
- w p (k) is a weighting constant, and is determined from, for example, the determination flag and the S / N ratio for each spectrum component as in the following formula (9), and the value of the spectrum number adjacent to the value of the spectrum number is determined. Smoothed with the value.
- snr (k) is the S / N ratio for each spectral component output from the S / N ratio calculator 8
- TH SB_SNR is a predetermined constant threshold value.
- weighting is suppressed (the weighting constant is set to 1.0), and weighting is performed on the spectrum component estimated to have a high S / N ratio. Even when the determination flag is wrong when the current frame is speech but noise, weighting can be performed. Note that the threshold TH SB_SNR can be changed as appropriate according to the state of the input signal and the noise level.
- the SN ratio calculation unit 8 includes a power spectrum Y ( ⁇ , k) output from the power spectrum calculation unit 3, an estimated noise spectrum N ( ⁇ , k) output from the noise spectrum estimation unit 6, and a weight coefficient calculation unit 7.
- the weighting factor W ( ⁇ , k) to be output and the spectral suppression amount G ( ⁇ 1, k) of the previous frame output by the suppression amount calculation unit 9 described later the a posteriori SNR (a postoriori) for each spectral component. SNR) and a priori SNR (a priori SNR) are calculated.
- the a posteriori SNR ⁇ ( ⁇ , k) can be obtained from the following equation (10) using the power spectrum Y ( ⁇ , k) and the estimated noise spectrum N ( ⁇ , k).
- correction is performed so that the posterior SNR is estimated to be higher at the spectrum peak.
- the prior SNR ⁇ ( ⁇ , k) is expressed by the following equation (11) using the spectral suppression amount G ( ⁇ 1, k) of the previous frame and the posterior SNR ⁇ ( ⁇ 1, k) of the previous frame. Ask.
- FIG. 4 schematically shows the state of the prior SNR when the posterior SNR weighted based on the weighting factor W ( ⁇ , k) is used.
- FIG. 4A is the same as the waveform of FIG. 3 and shows the relationship between the voice spectrum and the noise spectrum.
- FIG. 4B shows the state of the prior SNR when weighting is not performed, and
- FIG. 4C shows the state of the prior SNR when weighting is performed.
- FIG. 4B shows a threshold value TH SB_SNR for explaining the method.
- FIG. 4 (b) Comparing FIG. 4 (b) and FIG. 4 (c), in FIG. 4 (b), the SN ratio of the peak portion of the speech spectrum buried in noise cannot be extracted well, whereas FIG. 4 (c). Then, it can be seen that the SN ratio of the peak portion is successfully extracted. It can also be seen that the SN ratio of the peak portion exceeding the threshold TH SB_SNR is not excessively large and operates well.
- the prior SNR can also be weighted, and both the posterior SNR and the prior SNR are weighted. Also good.
- the constant in the above equation (9) may be changed so as to be suitable as the weighting of the prior SNR.
- the obtained a posteriori SNR ⁇ ( ⁇ , k) and the prior SNR ⁇ ( ⁇ , k) are output to the suppression amount calculation unit 9, and the prior SNR ⁇ ( ⁇ , k) is weighted as the S / N ratio for each spectrum component. It outputs to the coefficient calculation part 7.
- the suppression amount calculation unit 9 obtains a spectrum suppression amount G ( ⁇ , k), which is a noise suppression amount for each spectrum, from the prior SNR and the a posteriori SNR ⁇ ( ⁇ , k) output from the SN ratio calculation unit 8, and the spectrum suppression unit 10 is output.
- the Joint MAP method is a method for estimating a spectrum suppression amount G ( ⁇ , k) on the assumption that a noise signal and a speech signal are Gaussian distributions.
- the prior SNR ⁇ ( ⁇ , k) and the a posteriori SNR ⁇ ( ⁇ , k) Is used to obtain an amplitude spectrum and a phase spectrum that maximize the conditional probability density function, and use these values as estimated values.
- the spectrum suppression amount can be expressed by the following equation (12) using ⁇ and ⁇ that determine the shape of the probability density function as parameters.
- reference literature 1 shown below is referred to and is omitted here.
- the spectrum suppression unit 10 performs suppression for each spectrum of the input signal according to the following equation (13), obtains a noise-suppressed speech signal spectrum S ( ⁇ , k), and outputs it to the inverse Fourier transform unit 11.
- the obtained speech spectrum S ( ⁇ , k) is subjected to inverse Fourier transform by the inverse Fourier transform unit 11 and superimposed with the output signal of the previous frame, and then the noise-suppressed speech signal s (t) is output to the output terminal. 12 is output.
- FIG. 5 schematically shows the spectrum of the output signal in the speech section as an example of the output result of the noise suppression apparatus according to the first embodiment.
- FIG. 5A shows an output result by a conventional method in which the S / N ratio weighting shown in Expression (10) is not performed when the spectrum shown in FIG. 2 is used as an input signal, and FIG. It is an output result in the case of weighting the SN ratio shown in 10).
- FIG. 5A the harmonic structure of the voice in the band buried in noise disappears
- FIG. 5B the harmonic structure of the voice in the band buried in noise is restored. Thus, it can be seen that good noise suppression can be performed.
- the signal-to-noise ratio is corrected by maintaining the harmonic structure of the voice even in a band where the voice is buried in noise and the signal-to-noise ratio is negative. Since estimation can be performed, excessive suppression of speech can be suppressed and high-quality noise suppression can be performed.
- the harmonic structure of speech buried in noise can be corrected by weighting the S / N ratio, it is not necessary to generate a pseudo low frequency signal and the like, with a small amount of processing and memory. High quality noise suppression can be performed.
- weighting control is performed using the speech / noise section determination flag and the SN ratio for each spectral component of the previous frame, it is unnecessary in a band with a high noise section and SN ratio. Therefore, it is possible to suppress high weighting and to perform higher quality noise suppression.
- the correction of both the low-frequency and high-frequency harmonic structures is performed as an example.
- the present invention is not limited to this, and only the low-frequency range or only the high-frequency range is necessary. Correction of a specific frequency band such as only around 500 to 800 Hz may be performed. Such correction of the frequency band is effective, for example, for correcting sound buried in narrow band noise such as wind noise and automobile engine sound.
- Embodiment 2 FIG. In the first embodiment described above, the configuration in which the weighting value is constant in the frequency direction in Equation (9) is shown, but in this second embodiment, the configuration in which the weighting value is different in the frequency direction is shown.
- the low-frequency harmonic structure is clear as a general feature of speech, it is possible to increase the weight and decrease the weight as the frequency increases.
- the component of the noise suppression apparatus of Embodiment 2 is the same as Embodiment 1, description is abbreviate
- the second embodiment since it is configured to perform different weighting for each frequency in the S / N ratio estimation, it is possible to perform weighting suitable for each frequency of speech, and to further suppress high-quality noise. It can be performed.
- Embodiment 3 In the first embodiment described above, the configuration in which the weighting value is set to a predetermined constant in the expression (9) is shown. However, in this third embodiment, a plurality of weighting constants are switched according to the sound quality index of the input signal. A configuration in which the control is used or controlled using a predetermined function is shown. For example, when the maximum value of the autocorrelation coefficient is high in Equation (4) as an index of the soundness of the input signal, that is, the control factor of the state of the input signal, that is, the periodic structure of the input signal is clear (the input signal is sound The weight can be increased when the probability is high), and the weight can be decreased when the probability is low. Further, the autocorrelation function and the voice / noise interval determination flag may be used together. In addition, since the component of the noise suppression apparatus of Embodiment 3 is the same as Embodiment 1, description is abbreviate
- the weighting constant value is controlled according to the state of the input signal, the periodicity of the sound is obtained when the input signal is highly likely to be sound. Weighting can be performed so as to make the structure stand out, and voice deterioration can be suppressed. As a result, higher quality noise suppression can be performed.
- FIG. FIG. 6 is a block diagram showing a configuration of a noise suppression apparatus according to Embodiment 4 of the present invention.
- a configuration is shown in which all spectral peaks are detected for period component estimation.
- the S / N ratio of the previous frame calculated by the S / N ratio calculation unit 8 is set to the period.
- the periodic component estimation unit 4 detects the spectrum peak only in the band having a high SN ratio using the SN ratio of the previous frame when detecting the spectrum peak.
- the normalized autocorrelation function ⁇ N ( ⁇ , ⁇ ) it is also possible to perform the calculation only in a band having a high SN ratio. Since other configurations are the same as those of the noise suppression device according to the first embodiment, description thereof is omitted.
- the periodic component estimation unit 4 detects a spectrum peak only in a band with a high SN ratio using the SN ratio of the previous frame input from the ratio calculation unit 8.
- the normalized autocorrelation function is calculated only in the band with a high S / N ratio, the accuracy of spectrum peak detection and the accuracy of speech / noise interval determination can be improved, and further high-quality noise suppression is performed. be able to.
- Embodiment 5 FIG.
- the configuration in which the weighting factor calculation unit 7 performs weighting of the S / N ratio so as to emphasize the spectrum peak has been described.
- the valley portion of the spectrum is reversed.
- the detection of the spectrum valley is performed, for example, by regarding the median value of the spectrum number between the spectrum peaks as the spectrum valley portion. Since other configurations are the same as those of the noise suppression device according to the first embodiment, description thereof is omitted.
- the weighting factor calculation unit 7 can make the frequency structure of the voice stand out by weighting so that the SN ratio of the valley portion of the spectrum is reduced. High quality noise suppression can be performed.
- Embodiments 1 to 5 described above the maximum a posteriori method (Joint MAP method) has been described as the noise suppression method, but the present invention can also be applied to other methods.
- Joint MAP method the maximum a posteriori method
- the present invention is not limited to a narrowband telephone voice, and for example, a wideband such as 0 to 8000 Hz. It can also be applied to telephone voices and acoustic signals.
- the noise-suppressed output signal is sent in a digital data format to various audio-acoustic processing devices such as a voice encoding device, a voice recognition device, a voice storage device, and a hands-free call device.
- the noise suppression device 100 of the present embodiment can be realized by a DSP (digital signal processor) alone or together with the other devices described above, or executed as a software program.
- the program may be stored in a storage device of a computer device that executes the software program, or may be distributed in a storage medium such as a CD-ROM. It is also possible to provide a program through a network.
- D / A digital / analog
- it can be amplified by an amplifying apparatus and directly output as an audio signal from a speaker or the like.
- the configuration in which the SN ratio which is the ratio of the power spectrum of speech to the estimated noise power spectrum, is used as the signal information of the power spectrum.
- the SN ratio which is the ratio of the power spectrum of speech to the estimated noise power spectrum.
- the noise suppression device is an audio communication system such as a car navigation system, a cellular phone, and an interphone, a video conference system, a monitoring system, etc. It can be used to improve the recognition rate of the recognition system.
Landscapes
- Engineering & Computer Science (AREA)
- Human Computer Interaction (AREA)
- Quality & Reliability (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Noise Elimination (AREA)
- Circuit For Audible Band Transducer (AREA)
- Soundproofing, Sound Blocking, And Sound Damping (AREA)
- Telephone Function (AREA)
Abstract
Description
実施の形態1.
図1は、この発明の実施の形態1による雑音抑圧装置の構成を示すブロック図である。
雑音抑圧装置100は、入力端子1、フーリエ変換部2、パワースペクトル計算部3、周期成分推定部4、音声/雑音区間判定部(音声/雑音判定部)5、雑音スペクトル推定部6、重み係数計算部7、SN比計算部(抑圧係数計算部)8、抑圧量計算部9、スペクトル抑圧部10、逆フーリエ変換部(変換部)11、および出力端子12で構成されている。
まず、マイクロホン(図示せず)などを通じて取り込まれた音声や音楽などが、A/D(アナログ・デジタル)変換された後、所定のサンプリング周波数(例えば、8kHz)でサンプリングされると共にフレーム単位に分割(例えば10ms)され、雑音抑圧装置100へ入力端子1を介して入力される。
一方、判定フラグVflag=1の場合には、現フレームの入力信号が音声であり、前フレームの推定雑音スペクトルN(λ-1,k)を、そのまま現フレームの推定雑音スペクトルN(λ,k)として出力する。
なお、p(λ,k)=0の場合の重み付け定数wZ(k)については通常は1.0のままの重み付け無しでよいが、必要に応じてwp(k)と同様に判定フラグとスペクトル成分毎のSN比で制御することも可能である。
ただし、
周期性情報p(λ,k)=1、かつ、判定フラグVflag=1(音声)の場合
周期性情報p(λ,k)=1、かつ、判定フラグVflag=0(雑音)の場合
事後SNRγ(λ,k)は、パワースペクトルY(λ,k)と推定雑音スペクトルN(λ,k)とを用いて、次の式(10)から求めることができる。また、前出の式(9)に基づく重み付けをすることにより、スペクトルピークでは事後SNRをより高く推定するように補正を行うこととなる。
図4は重み係数W(λ,k)に基づいて重み付けされた事後SNRを用いた時の、事前SNRの様態を模式的に示したものである。図4(a)は、図3の波形と同一であり、音声スペクトルと雑音スペクトルとの関係を示している。図4(b)は、重み付けを行わなかった場合の事前SNRの様態、図4(c)は重み付けを行った場合の事前SNRの様態を表している。また、図4(b)には方式説明のために閾値THSB_SNRを記載している。図4(b)と図4(c)とを比較すると、図4(b)では雑音に埋もれていた音声スペクトルのピーク部分のSN比がうまく抽出できていないのに対し、図4(c)ではピーク部分のSN比がうまく抽出できていることがわかる。また、閾値THSB_SNRを越えるピーク部分のSN比も過度に大きくなっておらず、良好に動作することがわかる。
以上、得られた事後SNRγ(λ,k)と事前SNRξ(λ,k)とを抑圧量計算部9へ出力するとともに、事前SNRξ(λ,k)についてはスペクトル成分毎のSN比として、重み係数計算部7へ出力する。
上述した実施の形態1では、式(9)において重み付けの値を周波数方向に一定とする構成を示したが、この実施の形態2では重み付けの値を周波数方向に異なる値とする構成を示す。
例えば、音声の一般的な特徴として低域の調波構造ははっきりしていることから重み付けを大きくし、周波数が高くなるにつれて重み付けを小さくすることが可能である。なお、実施の形態2の雑音抑圧装置の構成要素は実施の形態1と同一であることから説明を省略する。
上述した実施の形態1では、式(9)において重み付けの値を所定の定数とする構成を示したが、この実施の形態3では入力信号の音声らしさの指標に応じて複数の重み付け定数を切り替えて用いる、あるいは所定の関数を用いて制御する構成を示す。
入力信号の音声らしさの指標、即ち、入力信号の様態の制御要因として、例えば式(4)において自己相関係数の最大値が高い場合、即ち、入力信号の周期構造が明確(入力信号が音声の可能性が高い)な場合には重みを大きく、低い場合には重みを小さくすることが可能である。また、自己相関関数と音声・雑音区間判定フラグを併せて用いてもよい。なお、実施の形態3の雑音抑圧装置の構成要素は実施の形態1と同一であることから説明を省略する。
図6は、この発明の実施の形態4による雑音抑圧装置の構成を示すブロック図である。
上述した実施の形態1では、周期成分推定のために全てのスペクトルピークの検出を行う構成を示したが、この実施の形態4では、SN比計算部8が算出する前フレームのSN比を周期成分推定部4に出力し、周期成分推定部4はスペクトルピークの検出を行う際に、当該前フレームのSN比を用いてSN比が高い帯域のみでスペクトルピークの検出を行う。同様に、正規化自己相関関数ρN(λ,τ)の算出においてもSN比が高い帯域のみで算出を行うことも可能である。なお、その他の構成は実施の形態1による雑音抑圧装置と同一であるため説明を省略する。
上述した実施の形態1から実施の形態4では、重み係数計算部7がスペクトルピークを強調するようにSN比の重み付けを行う構成を示したが、この実施の形態5では逆にスペクトルの谷部分を強調するように、即ち、スペクトルの谷においてはSN比を小さくするように重み付けを行う構成について示す。
スペクトルの谷の検出は、例えば、スペクトルピーク間のスペクトル番号の中央値をスペクトルの谷部分とみなすことにより行う。なお、その他の構成は実施の形態1による雑音抑圧装置と同一であるため説明を省略する。
Claims (5)
- 時間領域の入力信号を周波数領域の信号であるパワースペクトルに変換するパワースペクトル計算部と、
前記パワースペクトルが音声であるか雑音であるか判定する音声/雑音判定部と、
前記音声/雑音判定部の判定結果に基づき前記パワースペクトルの雑音スペクトルを推定する雑音スペクトル推定部と、
前記パワースペクトルを構成する調波構造を分析し、前記パワースペクトルの周期性情報を推定する周期成分推定部と、
前記周期性情報、前記音声/雑音判定部の判定結果、および前記パワースペクトルの信号情報に基づき、前記パワースペクトルに重み付けを行うための重み付け係数を算出する重み係数計算部と、
前記パワースペクトル、前記音声/雑音判定部の判定結果および前記重み付け係数に基づき、前記パワースペクトルに含まれる雑音を抑制するための抑圧係数を算出する抑圧係数計算部と、
前記抑圧係数を用いて前記パワースペクトルの振幅を抑圧するスペクトル抑圧部と、
前記スペクトル抑圧部において振幅抑圧されたパワースペクトルを時間領域に変換して雑音抑圧信号を得る変換部とを備えた雑音抑圧装置。 - 前記抑圧係数計算部は、前記パワースペクトルの信号情報としてパワースペクトル毎の信号対雑音比を算出し、
前記重み係数計算部は、前記信号対雑音比に対応した重み付け係数を算出することを特徴とする請求項1記載の雑音抑圧装置。 - 前記抑圧係数計算部は、前記音声/雑音判定部の判定結果に応じて重み付けの強度を制御した重み付け係数を算出することを特徴とする請求項1記載の雑音抑圧装置。
- 前記抑圧係数計算部は、現フレームの一つ前の前フレームのパワースペクトルの信号対雑音比を算出し、
前記重み係数計算部は、前記前フレームの信号対雑音比に応じて重み付けの強度を制御した重み付け係数を算出することを特徴とする請求項2記載の雑音抑圧装置。 - 前記重み係数計算部は、パワースペクトルの帯域成分に応じて重み付け強度を制御した重み付け係数を算出することを特徴とする請求項1記載の雑音抑圧装置。
Priority Applications (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2012534826A JP5183828B2 (ja) | 2010-09-21 | 2010-09-21 | 雑音抑圧装置 |
CN201080069164.XA CN103109320B (zh) | 2010-09-21 | 2010-09-21 | 噪声抑制装置 |
PCT/JP2010/005711 WO2012038998A1 (ja) | 2010-09-21 | 2010-09-21 | 雑音抑圧装置 |
DE112010005895.4T DE112010005895B4 (de) | 2010-09-21 | 2010-09-21 | Störungsunterdrückungsvorrichtung |
US13/814,332 US8762139B2 (en) | 2010-09-21 | 2010-09-21 | Noise suppression device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2010/005711 WO2012038998A1 (ja) | 2010-09-21 | 2010-09-21 | 雑音抑圧装置 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2012038998A1 true WO2012038998A1 (ja) | 2012-03-29 |
Family
ID=45873521
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2010/005711 WO2012038998A1 (ja) | 2010-09-21 | 2010-09-21 | 雑音抑圧装置 |
Country Status (5)
Country | Link |
---|---|
US (1) | US8762139B2 (ja) |
JP (1) | JP5183828B2 (ja) |
CN (1) | CN103109320B (ja) |
DE (1) | DE112010005895B4 (ja) |
WO (1) | WO2012038998A1 (ja) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2014051149A (ja) * | 2012-09-05 | 2014-03-20 | Yamaha Corp | エンジン音加工装置 |
CN104364845A (zh) * | 2012-05-01 | 2015-02-18 | 株式会社理光 | 处理装置、处理方法、程序、计算机可读信息记录介质以及处理系统 |
CN108899042A (zh) * | 2018-06-25 | 2018-11-27 | 天津科技大学 | 一种基于移动平台的语音降噪方法 |
Families Citing this family (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP2581904B1 (en) * | 2010-06-11 | 2015-10-07 | Panasonic Intellectual Property Corporation of America | Audio (de)coding apparatus and method |
US9304010B2 (en) * | 2013-02-28 | 2016-04-05 | Nokia Technologies Oy | Methods, apparatuses, and computer program products for providing broadband audio signals associated with navigation instructions |
WO2015005914A1 (en) * | 2013-07-10 | 2015-01-15 | Nuance Communications, Inc. | Methods and apparatus for dynamic low frequency noise suppression |
JP6339896B2 (ja) * | 2013-12-27 | 2018-06-06 | パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカPanasonic Intellectual Property Corporation of America | 雑音抑圧装置および雑音抑圧方法 |
WO2016009654A1 (ja) * | 2014-07-16 | 2016-01-21 | 日本電気株式会社 | 雑音抑圧システムと雑音抑圧方法及びプログラムを格納した記録媒体 |
WO2017141317A1 (ja) * | 2016-02-15 | 2017-08-24 | 三菱電機株式会社 | 音響信号強調装置 |
CN106452627B (zh) * | 2016-10-18 | 2019-02-15 | 中国电子科技集团公司第三十六研究所 | 一种用于宽带频谱感知的噪声功率估计方法和装置 |
IL250253B (en) * | 2017-01-24 | 2021-10-31 | Arbe Robotics Ltd | A method for separating targets and echoes from noise, in radar signals |
US10587983B1 (en) * | 2017-10-04 | 2020-03-10 | Ronald L. Meyer | Methods and systems for adjusting clarity of digitized audio signals |
CN108600917B (zh) * | 2018-05-30 | 2020-11-10 | 扬州航盛科技有限公司 | 一种嵌入式多路音频管理系统及管理方法 |
IL260695A (en) | 2018-07-19 | 2019-01-31 | Arbe Robotics Ltd | Method and device for eliminating waiting times in a radar system |
IL260694A (en) | 2018-07-19 | 2019-01-31 | Arbe Robotics Ltd | Method and device for two-stage signal processing in a radar system |
IL260696A (en) | 2018-07-19 | 2019-01-31 | Arbe Robotics Ltd | Method and device for structured self-testing of radio frequencies in a radar system |
IL261636A (en) | 2018-09-05 | 2018-10-31 | Arbe Robotics Ltd | Deflected MIMO antenna array for vehicle imaging radars |
US10587439B1 (en) * | 2019-04-12 | 2020-03-10 | Rovi Guides, Inc. | Systems and methods for modifying modulated signals for transmission |
US11342895B2 (en) * | 2019-10-07 | 2022-05-24 | Bose Corporation | Systems and methods for modifying an audio playback |
WO2021070278A1 (ja) * | 2019-10-09 | 2021-04-15 | 三菱電機株式会社 | 雑音抑圧装置、雑音抑圧方法、及び雑音抑圧プログラム |
CN113744754B (zh) * | 2021-03-23 | 2024-04-05 | 京东科技控股股份有限公司 | 语音信号的增强处理方法和装置 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2001344000A (ja) * | 2000-05-31 | 2001-12-14 | Toshiba Corp | ノイズキャンセラとこのノイズキャンセラを備えた通信装置、並びにノイズキャンセル処理プログラムを記憶した記憶媒体 |
JP2004341339A (ja) * | 2003-05-16 | 2004-12-02 | Mitsubishi Electric Corp | 雑音抑圧装置 |
WO2005124739A1 (ja) * | 2004-06-18 | 2005-12-29 | Matsushita Electric Industrial Co., Ltd. | 雑音抑圧装置および雑音抑圧方法 |
JP2006113515A (ja) * | 2004-09-16 | 2006-04-27 | Toshiba Corp | ノイズサプレス装置、ノイズサプレス方法及び移動通信端末装置 |
JP2006201622A (ja) * | 2005-01-21 | 2006-08-03 | Matsushita Electric Ind Co Ltd | 帯域分割型雑音抑圧装置及び帯域分割型雑音抑圧方法 |
JP2008129077A (ja) * | 2006-11-16 | 2008-06-05 | Matsushita Electric Ind Co Ltd | ノイズ除去装置 |
Family Cites Families (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2002149200A (ja) * | 2000-08-31 | 2002-05-24 | Matsushita Electric Ind Co Ltd | 音声処理装置及び音声処理方法 |
AU2001294974A1 (en) * | 2000-10-02 | 2002-04-15 | The Regents Of The University Of California | Perceptual harmonic cepstral coefficients as the front-end for speech recognition |
DE60142800D1 (de) | 2001-03-28 | 2010-09-23 | Mitsubishi Electric Corp | Rauschunterdrücker |
US7027591B2 (en) * | 2002-10-16 | 2006-04-11 | Ericsson Inc. | Integrated noise cancellation and residual echo suppression |
US7359838B2 (en) * | 2004-09-16 | 2008-04-15 | France Telecom | Method of processing a noisy sound signal and device for implementing said method |
US20080243496A1 (en) * | 2005-01-21 | 2008-10-02 | Matsushita Electric Industrial Co., Ltd. | Band Division Noise Suppressor and Band Division Noise Suppressing Method |
JP4827675B2 (ja) | 2006-09-25 | 2011-11-30 | 三洋電機株式会社 | 低周波帯域音声復元装置、音声信号処理装置および録音機器 |
JP5275612B2 (ja) * | 2007-07-18 | 2013-08-28 | 国立大学法人 和歌山大学 | 周期信号処理方法、周期信号変換方法および周期信号処理装置ならびに周期信号の分析方法 |
US20110125490A1 (en) | 2008-10-24 | 2011-05-26 | Satoru Furuta | Noise suppressor and voice decoder |
WO2010113220A1 (ja) * | 2009-04-02 | 2010-10-07 | 三菱電機株式会社 | 雑音抑圧装置 |
EP2546831B1 (en) | 2010-03-09 | 2020-01-15 | Mitsubishi Electric Corporation | Noise suppression device |
-
2010
- 2010-09-21 DE DE112010005895.4T patent/DE112010005895B4/de active Active
- 2010-09-21 US US13/814,332 patent/US8762139B2/en active Active
- 2010-09-21 WO PCT/JP2010/005711 patent/WO2012038998A1/ja active Application Filing
- 2010-09-21 CN CN201080069164.XA patent/CN103109320B/zh active Active
- 2010-09-21 JP JP2012534826A patent/JP5183828B2/ja active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2001344000A (ja) * | 2000-05-31 | 2001-12-14 | Toshiba Corp | ノイズキャンセラとこのノイズキャンセラを備えた通信装置、並びにノイズキャンセル処理プログラムを記憶した記憶媒体 |
JP2004341339A (ja) * | 2003-05-16 | 2004-12-02 | Mitsubishi Electric Corp | 雑音抑圧装置 |
WO2005124739A1 (ja) * | 2004-06-18 | 2005-12-29 | Matsushita Electric Industrial Co., Ltd. | 雑音抑圧装置および雑音抑圧方法 |
JP2006113515A (ja) * | 2004-09-16 | 2006-04-27 | Toshiba Corp | ノイズサプレス装置、ノイズサプレス方法及び移動通信端末装置 |
JP2006201622A (ja) * | 2005-01-21 | 2006-08-03 | Matsushita Electric Ind Co Ltd | 帯域分割型雑音抑圧装置及び帯域分割型雑音抑圧方法 |
JP2008129077A (ja) * | 2006-11-16 | 2008-06-05 | Matsushita Electric Ind Co Ltd | ノイズ除去装置 |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104364845A (zh) * | 2012-05-01 | 2015-02-18 | 株式会社理光 | 处理装置、处理方法、程序、计算机可读信息记录介质以及处理系统 |
CN104364845B (zh) * | 2012-05-01 | 2017-03-08 | 株式会社理光 | 处理装置、处理方法、程序、计算机可读信息记录介质以及处理系统 |
JP2014051149A (ja) * | 2012-09-05 | 2014-03-20 | Yamaha Corp | エンジン音加工装置 |
CN108899042A (zh) * | 2018-06-25 | 2018-11-27 | 天津科技大学 | 一种基于移动平台的语音降噪方法 |
Also Published As
Publication number | Publication date |
---|---|
DE112010005895B4 (de) | 2016-12-15 |
JP5183828B2 (ja) | 2013-04-17 |
JPWO2012038998A1 (ja) | 2014-02-03 |
US20130138434A1 (en) | 2013-05-30 |
CN103109320B (zh) | 2015-08-05 |
CN103109320A (zh) | 2013-05-15 |
DE112010005895T5 (de) | 2013-07-18 |
US8762139B2 (en) | 2014-06-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP5183828B2 (ja) | 雑音抑圧装置 | |
JP5646077B2 (ja) | 雑音抑圧装置 | |
JP5875609B2 (ja) | 雑音抑圧装置 | |
JP5265056B2 (ja) | 雑音抑圧装置 | |
EP2546831B1 (en) | Noise suppression device | |
US8571231B2 (en) | Suppressing noise in an audio signal | |
JP5071346B2 (ja) | 雑音抑圧装置及び雑音抑圧方法 | |
JP4753821B2 (ja) | 音信号補正方法、音信号補正装置及びコンピュータプログラム | |
JP5245714B2 (ja) | 雑音抑圧装置及び雑音抑圧方法 | |
EP3276621B1 (en) | Noise suppression device and noise suppressing method | |
JP5595605B2 (ja) | 音声信号復元装置および音声信号復元方法 | |
JPWO2018163328A1 (ja) | 音響信号処理装置、音響信号処理方法、及びハンズフリー通話装置 | |
JP5840087B2 (ja) | 音声信号復元装置および音声信号復元方法 | |
JP5131149B2 (ja) | 雑音抑圧装置及び雑音抑圧方法 | |
CN111226278B (zh) | 低复杂度的浊音语音检测和基音估计 | |
JP6261749B2 (ja) | 雑音抑圧装置、雑音抑圧方法および雑音抑圧プログラム | |
JP2006201622A (ja) | 帯域分割型雑音抑圧装置及び帯域分割型雑音抑圧方法 | |
Esch et al. | Combined reduction of time varying harmonic and stationary noise using frequency warping |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WWE | Wipo information: entry into national phase |
Ref document number: 201080069164.X Country of ref document: CN |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 10857496 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2012534826 Country of ref document: JP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 13814332 Country of ref document: US |
|
WWE | Wipo information: entry into national phase |
Ref document number: 112010005895 Country of ref document: DE Ref document number: 1120100058954 Country of ref document: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 10857496 Country of ref document: EP Kind code of ref document: A1 |