KR101295727B1

KR101295727B1 - Apparatus and method for adaptive noise estimation

Info

Publication number: KR101295727B1
Application number: KR1020110126800A
Authority: KR
Inventors: 정성일
Original assignee: (주)트란소노
Priority date: 2010-11-30
Filing date: 2011-11-30
Publication date: 2013-08-16
Also published as: KR20120059431A

Abstract

본 발명은 적응적 잡음추정 장치 및 방법에 관한 것이다.
이러한 본 명세서는 음성 신호와 잡음 신호를 포함하는 노이지 음성 신호를 푸리에 변환하여 푸리에 스펙트럼을 구하는 단계, 상기 푸리에 스펙트럼을 미분 필터링 처리하여 필터링된 푸리에 스펙트럼을 구하는 단계, 필터뱅크에 존재하는 상기 잡음의 상태를 결정하는 단계, 상기 결정된 필터뱅크에 기반하여 설정되는 제1 평활화 계수를 이용하여 상기 필터링된 푸리에 스펙트럼을 주파수 영역에서 평활화하여 제1 평활화 스펙트럼을 구하는 단계, 상기 결정된 필터뱅크에 기반하여 설정되는 제2 평활화 계수를 이용하여, 상기 제1 평활화 스펙트럼을 시간 영역에서 평활화하여 제2 평활화 스펙트럼을 구하는 단계, 및 상기 결정된 필터뱅크에 기반하여 설정되는 망각 계수, 상기 제2 평활화 스펙트럼 및 상기 제2 평활화 스펙트럼의 최소값을 이용하여 잡음을 추정하는 단계를 포함하는 적응적 잡음 추정 방법을 개시한다.
본 발명에 따르면, 개선된 음성은 뮤지컬 잡음의 잔재와 음성왜곡의 인지를 효율적으로 억제할 수 있다. The present invention relates to an adaptive noise estimation apparatus and method.
In this specification, a Fourier transform is performed on a noisy speech signal including a voice signal and a noise signal to obtain a Fourier spectrum, differential filtering of the Fourier spectrum to obtain a filtered Fourier spectrum, and a state of the noise present in a filter bank. Determining a first smoothing spectrum by smoothing the filtered Fourier spectrum in a frequency domain using a first smoothing coefficient set based on the determined filter bank, and setting a first smoothing spectrum based on the determined filter bank. Obtaining a second smoothing spectrum by smoothing the first smoothing spectrum in a time domain using two smoothing coefficients, and a forgetting coefficient set based on the determined filter bank, the second smoothing spectrum, and the second smoothing spectrum Using the minimum value of It discloses an adaptive noise estimation method comprising the step of estimating the sound.
According to the present invention, the improved speech can effectively suppress the residual of musical noise and the perception of speech distortion.

Description

Adaptive Noise Estimation Apparatus and Method {APPARATUS AND METHOD FOR ADAPTIVE NOISE ESTIMATION}

본 발명은 음성 신호 처리에 관한 것으로, 보다 구체적으로 노이지 음성 신호(Noisy Speech Signal)에서 잡음의 상태를 판별하고 적응적으로 잡음을 추정하는 장치 및 방법에 관한 것이다. TECHNICAL FIELD The present invention relates to speech signal processing, and more particularly, to an apparatus and a method for determining a state of noise and adaptively estimating noise in a noisy speech signal.

음성 인식은 일반적으로 마이크나 전화 등을 통하여 얻어진 음향학적 신호를 단어나 단어 집합 또는 문장으로 변환하는 과정을 말한다. 이러한 음성 인식의 정확도를 향상시키기 위한 첫 번째 과정은, 단일 채널(single channel)을 통해 입력되는 잡음과 음성이 공존하는 입력 신호로부터 음향학적 신호인 음성 성분만을 효율적으로 추출하는 것이다. 단일 채널을 통해 입력되는 잡음과 음성이 공존하는 음성 신호의 음질을 개선하기 위해서는, 음성 성분에는 손상을 가하지 않으면서 잡음 성분만을 효율적으로 약화시키거나 또는 제거하여야 한다. Speech recognition generally refers to a process of converting an acoustic signal obtained through a microphone or a telephone into a word, a word set, or a sentence. The first step to improve the accuracy of the speech recognition is to efficiently extract only the speech components, which are acoustic signals, from the input signal where the noise and the voice coexisted through a single channel. In order to improve the sound quality of the voice signal in which the noise and the voice input through a single channel coexist, the noise component must be effectively weakened or eliminated without damaging the speech component.

따라서 단일 채널을 통해 입력되는 노이지(noisy) 음성 신호의 처리 절차에서는, 입력 노이지 음성 신호에서 잡음의 상태를 정확하게 파악하고, 또한 이를 이용하여 입력 노이지 음성 신호로부터 잡음 성분을 구하기 위한 잡음 추정(Noise Estimation) 절차를 기본적으로 포함한다. 그리고 추정된 잡음(Estimated Noise) 신호는 노이지 음성 신호에서 잡음 성분을 약화시키거나 또는 제거하여 음질을 개선하는데 이용된다. Therefore, in the process of processing a noisy speech signal input through a single channel, a noise estimation is performed to accurately grasp the state of noise in the input noisy speech signal and also to obtain a noise component from the input noisy speech signal. The procedure is basically included. The estimated noise signal is used to improve sound quality by attenuating or removing noise components from the noisy speech signal.

잡음추정은 단일채널 음성 개선(speech enhancement)에서 기본적으로 요구되며 가장 중요한 단계이다. 예를 들면, 잘못 추정된 잡음을 이용하여 개선된 음성은 다음의 문제점을 수반한다. 첫째, 추정된 잡음이 실제(original) 잡음보다 낮게(lower) 평가되면 뮤지컬(musical) 잡음이 잔재한다. 뮤지컬 잡음은 임의의(random) 주파수 성분이며 청취자에게 지각적으로 성가시게 하는 인공음(artifacts)이다. 둘째, 추정된 잡음이 실제 잡음보다 높게(higher) 평가되면 음성왜곡(speech distortion)을 유발한다. 음성왜곡은 음성신호의 감쇠에 의한 부자연스러운 청감을 말한다. 비정적인(non-stationary) 잡음에 오염된 음성으로부터 뮤지컬 잡음과 음성왜곡을 수반하지 않도록 잡음추정을 수행하는 것은 매우 어려운 일이다. Noise estimation is fundamentally required and the most important step in single channel speech enhancement. For example, speech improved using falsely estimated noise involves the following problems. First, musical noise remains when the estimated noise is evaluated lower than the original noise. Musical noise is a random frequency component and artifacts that perceptually annoy the listener. Second, if the estimated noise is evaluated higher than the actual noise, it causes speech distortion. Voice distortion is an unnatural hearing caused by attenuation of voice signals. It is very difficult to perform noise estimation so that it does not involve musical noise and distortion from voices contaminated with non-stationary noise.

잡음 추정 방법의 하나는 음성 활동 검출(Voice Activity Detection, VAD)과 이를 이용하는 VAD 기반 잡음 추정 방법이다. VAD 기반 잡음 추정 방법에 의하면, 이전에 검출된 여러 명시적(explicit) 잡음 프레임(Noise Frame)이나 긴 과거 프레임으로부터 획득한 통계 정보를 이용하여 잡음의 상태를 파악하고 또한 잡음을 추정한다. 명시적 잡음 프레임이란 음성이 포함되지 않은 묵음 프레임(Silent Frame or Speech-absent Frame) 또는 전체 노이지 음성 신호에서 음성보다 잡음 성분이 압도적으로 우세한 잡음-우세 프레임(Noise Dominant Frame)을 일컫는다. One of the noise estimation methods is voice activity detection (VAD) and a VAD based noise estimation method using the same. According to the VAD-based noise estimation method, the state of noise and the noise are also estimated by using statistical information obtained from previously detected explicit noise frames or long past frames. An explicit noise frame refers to a silent frame or speech-absent frame that does not include speech or a noise-dominant frame in which the noise component is overwhelmingly superior to speech in the entire noisy speech signal.

이러한 기존의 VAD 기반 잡음 추정 방법은 배경 잡음이 시간에 따라서 변화가 심하지 않는 경우에 상당히 우수한 성능을 보여 준다. 그러나 배경 잡음이 비정적(Non-stationary)이거나 또는 레벨 가변적(Level-varying)인 경우, 신호 대 잡음비(Signal to Noise Ratio, SNR)가 낮은 경우, 또는 음성 신호의 에너지가 약한 경우 등에, VAD 기반 잡음 추정 방법은 잡음의 상태나 현재의 잡음 레벨에 대한 신뢰할 수 있는 정보를 획득하기가 어려운 단점이 있다. 이는 잘못 검출된 VAD 정보의 적용 때문이다. 또한, VAD 기반 방법은 잡음 추정을 위한 여러 단계에서 비교적 높은 계산비용을 요구하는 문제가 있다. This conventional VAD-based noise estimation method shows a very good performance when the background noise does not change with time. However, if the background noise is non-stationary or level-varying, if the signal to noise ratio (SNR) is low, or if the energy of the speech signal is weak, VAD-based The noise estimation method has a disadvantage in that it is difficult to obtain reliable information about the state of noise or the current noise level. This is due to the application of incorrectly detected VAD information. In addition, the VAD-based method has a problem of requiring a relatively high computational cost at various stages for noise estimation.

VAD 기반 방법의 단점을 극복하기 위하여, 여러 가지 새로운 방법이 제안되었다. 이들 중에서 널리 알려진 방법은 회귀 평균화(Recursive Averaging)를 기반으로 하는 가중된 평균화(Weighted Average, WA) 방법이다. WA 방법은 VAD를 도입하지 않고 주파수 영역에서 잡음 추정하고 또한 추정된 잡음을 연속적으로 갱신하는 방법이다. WA 방법에 의하면, 현재 프레임에서 오염된 음성 신호의 크기 스펙트럼(Magnitude Spectrum) 크기와 이전 프레임에서 추정된 잡음의 크기 스펙트럼 크기 사이에 고정된 망각 계수(Fixed Forgetting Factor)를 적용하여 잡음을 추정한다. 하지만, 이러한 WA 방법은 고정된 망각 계수를 적용하기 때문에, 다양한 잡음 환경이나 비정적인 잡음 환경에서의 잡음 변화를 반영할 수 없으며, 그 결과 올바른 잡음 추정을 수행하지 못하는 한계가 있다. In order to overcome the shortcomings of the VAD based method, several new methods have been proposed. Among them, a well-known method is a weighted average (WA) method based on recursive averaging. The WA method estimates noise in the frequency domain without introducing VAD and continuously updates the estimated noise. According to the WA method, noise is estimated by applying a fixed forgetting factor between the magnitude spectrum of the speech signal contaminated in the current frame and the magnitude spectrum size of the noise estimated in the previous frame. However, since the WA method applies a fixed forgetting coefficient, it cannot reflect the noise change in various noise environments or non-static noise environments, and as a result, there is a limit in that it is impossible to perform a proper noise estimation.

VAD 기반 잡음 추정 방법의 단점을 극복하기 위하여 제안된 다른 하나의 잡음 추정 방법은 최소 통계(Minimum Statistics, MS) 방법을 이용하는 것이다. 이에 의하면, 탐색 윈도우(Search Window)에 걸쳐서 노이지 음성 신호의 평활된 파워 스펙트럼(Smoothed Power Spectrum)의 최소값을 추적하며, 추적된 최소값에 보상 상수(Compensate Constant)를 곱하여 잡음을 추정한다. 여기서, 탐색 윈도우는 약 1.5초에 해당하는 과거 프레임의 길이를 의미한다. MS 방법은 일반적으로 훌륭한 성능을 보여 주지만, 탐색 윈도우 길이에 해당하는 긴 과거 프레임의 정보가 지속적으로 필요하기 때문에 대용량의 메모리가 필요하고, 특히 잡음이 대부분을 차지하는 잡음 우세 신호(Noise Dominant Signal)에서 잡음 레벨의 변화를 빨리 추적할 수 없는 단점이 있다. 또한, MS 방법도 기본적으로 과거 프레임의 추정된 잡음 정보를 이용하기 때문에, 잡음의 레벨 변화가 심하거나 또는 다른 잡음 환경으로 바뀌는 경우에는 신뢰할만한 결과를 보여 주지 못하는 한계가 있다. Another noise estimation method proposed to overcome the shortcomings of the VAD-based noise estimation method is to use a minimum statistics (MS) method. According to this, the minimum value of the smoothed power spectrum of the noisy speech signal is tracked over the search window, and the noise is estimated by multiplying the minimum value by the compensated constant. Here, the search window means the length of the past frame corresponding to about 1.5 seconds. The MS method generally shows good performance, but it requires a large amount of memory because the information of the long past frame corresponding to the search window length is constantly needed, especially in the noise dominant signal, which is mostly noise. The disadvantage is that it is not possible to track changes in the noise level quickly. In addition, since the MS method basically uses the estimated noise information of the past frame, there is a limit in that a reliable result cannot be obtained when the level of the noise is severe or changes to another noise environment.

이러한 기존 MS 방법의 단점을 해결하기 위하여, 최소 제어 회귀 평균(minima controlled recursive averaging: MCRA) 방법이 제안되었는데, 이에 따르면 현재 주파수 위치에서 신호존재 확률(signal presence probability)에 의해 조정되는 평활화 계수를 가진 회귀 평균화에 의해 잡음 추정이 수행된다. 또한, 여러 가지 종류의 수정된 MS 방법이 제안되는데 이들의 대부분이 갖는 공통적인 두 가지 특징은 다음과 같다. 첫째는 고려 대상이 되는 현재 프레임 또는 주파수 위치(Frequency Bin)에 음성이 존재하는지 또는 묵음 구간인지를 연속적으로 구별하기 위한 VAD 방법을 포함하고 있다는 것이고, 두 번째 특징은 순환 평균(Recursive Averaging, RA) 기반의 잡음 추정기(Noise Estimator)가 사용된다는 것이다. 그러나, 수정된 방식의 대부분은 MS 방법의 최소값 추적을 기반ㄴ으로 하기 때문에 유사한 문제점을 안고 있다. In order to solve the shortcomings of the conventional MS method, a minimal controlled recursive averaging (MCRA) method has been proposed, which has a smoothing coefficient adjusted by signal presence probability at the current frequency position. Noise estimation is performed by regression averaging. In addition, several types of modified MS methods are proposed, two of which are common to the following. The first is to include a VAD method for continuously distinguishing whether voice is present or silent in the current frame or frequency bin under consideration, and the second feature is Recursive Averaging (RA). Based noise estimator is used. However, most of the modified schemes have similar problems because they are based on tracking the minimum of the MS method.

본 발명의 기술적 과제는 VAD에 의해 검출된 여러 명시적 잡음 프레임이나 긴 과거 프레임에서 제시하는 통계적 정보를 이용하지 않고 적응적으로 잡음을 추정하는 장치 및 방법을 제공함에 있다. An object of the present invention is to provide an apparatus and method for adaptively estimating noise without using statistical information presented by various explicit noise frames or long past frames detected by the VAD.

본 발명의 다른 기술적 과제는 주파수 영역에서 미분기를 이용한 필터링, 주파수 영역에서 평활화와 시간 영역에서 적응적 평활화를 거친 신호를 대상으로 잡음의 상태를 판별하기 위한 크기 신호대 잡음비(Magnitude Signal to Noise Ratio: MSNR SNR)과 전방향 트랙킹 신호대 잡음비(Forward Tracking SNR: FTSNR)을 이용하여 적응적으로 잡음을 추정하는 장치 및 방법을 제공함에 있다. Another technical problem of the present invention is Magnitude Signal to Noise Ratio (MSNR) for determining the state of noise in a signal subjected to filtering using a differentiator in the frequency domain, smoothing in the frequency domain, and adaptive smoothing in the time domain. An apparatus and method for adaptively estimating noise using SNR) and forward tracking signal-to-noise ratio (FTSNR) are provided.

본 발명의 일 양태에 따르면, 적응적 잡음 추정 방법을 제공한다. 상기 적응적 잡음 추정 방법은 음성 신호와 잡음 신호를 포함하는 노이지 음성 신호를 푸리에(fourier) 변환하여 푸리에 스펙트럼을 구하는 단계, 상기 푸리에 스펙트럼을 미분 필터링(filtering) 처리하여 필터링된 푸리에 스펙트럼을 구하는 단계, 필터뱅크(filter bank)에 존재하는 상기 잡음의 상태를 결정하는 단계, 상기 결정된 필터뱅크에 기반하여 설정되는 제1 평활화(smoothing) 계수를 이용하여 상기 필터링된 푸리에 스펙트럼을 주파수 영역에서 평활화하여 제1 평활화 스펙트럼을 구하는 단계, 상기 결정된 필터뱅크에 기반하여 설정되는 제2 평활화 계수를 이용하여, 상기 제1 평활화 스펙트럼을 시간 영역에서 평활화하여 제2 평활화 스펙트럼을 구하는 단계, 및 상기 결정된 필터뱅크에 기반하여 설정되는 망각 계수, 상기 제2 평활화 스펙트럼 및 상기 제2 평활화 스펙트럼의 최소값을 이용하여 잡음을 추정하는 단계를 포함한다. According to one aspect of the present invention, an adaptive noise estimation method is provided. The adaptive noise estimating method includes Fourier transforming a noisy speech signal including a speech signal and a noise signal to obtain a Fourier spectrum, differentially filtering the Fourier spectrum to obtain a filtered Fourier spectrum, Determining a state of the noise present in a filter bank, and smoothing the filtered Fourier spectrum in a frequency domain using a first smoothing coefficient set based on the determined filter bank. Obtaining a smoothing spectrum, obtaining a second smoothing spectrum by smoothing the first smoothing spectrum in a time domain using a second smoothing coefficient set based on the determined filter bank, and based on the determined filter bank The forgetting coefficient set, the second smoothing spectrum and the 2 using the minimum value of the smoothed spectrum and a step of estimating the noise.

상기 잡음의 상태는, 크기 신호대 잡음(magnitude signal to noise ratio: MSNR) 및 전방향 탐색 신호대 잡음(forward search signal to noise ratio: FSSNR)에 의해 결정될 수 있다. The state of the noise may be determined by magnitude signal to noise ratio (MSNR) and forward search signal to noise ratio (FSSNR).

상기 잡음을 추정하는 단계는, 이전 프레임에서 추정된 잡음을 더 이용하여 상기 잡음을 추정할 수 있다. The estimating of the noise may further estimate the noise using the noise estimated in the previous frame.

상기 제2 평활화 스펙트럼의 최소값은, 일정한 과거 탐색 윈도우로부터 검출된 최소 스펙트럼일 수 있다. The minimum value of the second smoothed spectrum may be the minimum spectrum detected from a constant past search window.

상기 제1 평활화 계수, 상기 제2 평활화 계수 및 상기 망각 계수는, 전체 고속 푸리에 변환(fast fourier transform: FFT) 포인트를 결정하는 제1 지수(exponent)와 상기 필터뱅크의 개수를 결정하는 제2 지수의 차이로 결정되는 필터뱅크 인덱스와, 전체 필터뱅크를 저음, 중음, 고음으로 나눈 2차 필터뱅크 인덱스를 비교한 결과에 따라 가변적으로 설정될 수 있다. The first smoothing coefficient, the second smoothing coefficient, and the forgetting coefficient are first exponents for determining total fast fourier transform (FFT) points and second exponents for determining the number of filter banks. The filter bank index determined by the difference between the filter bank index and the second filter bank index obtained by dividing the entire filter bank into low, mid, and high frequencies may be variably set.

상기 제1 평활화 계수, 상기 제2 평활화 계수 및 상기 망각 계수는, 상기 크기 신호대 잡음 및 상기 전방향 탐색 신호대 잡음을 임계치와 비교한 결과에 따라 가변적으로 설정될 수 있다. The first smoothing coefficient, the second smoothing coefficient, and the forgetting coefficient may be variably set according to a result of comparing the magnitude signal-to-noise and the omnidirectional search signal-to-noise with a threshold.

본 발명의 다른 양태에 따르면, 적응적 잡음 추정 장치를 제공한다. 상기 적응적 잡음 추정 장치는 음성 신호와 잡음 신호를 포함하는 노이지 음성 신호를 푸리에(fourier) 변환하여 푸리에 스펙트럼을 구하는 푸리에 변환 유닛, 상기 푸리에 스펙트럼을 미분 필터링(filtering) 처리하여 필터링된 푸리에 스펙트럼을 구하는 필터링 유닛, 필터뱅크(filter bank)에 존재하는 상기 잡음의 상태를 결정하는 잡음상태 결정 유닛, 상기 결정된 필터뱅크에 기반하여 설정되는 제1 평활화(smoothing) 계수를 이용하여 상기 필터링된 푸리에 스펙트럼을 주파수 영역에서 평활화하여 제1 평활화 스펙트럼을 제1 평활화 유닛, 상기 결정된 필터뱅크에 기반하여 설정되는 제2 평활화 계수를 이용하여, 상기 제1 평활화 스펙트럼을 시간 영역에서 평활화하여 제2 평활화 스펙트럼을 제2 평활화 유닛, 및 상기 결정된 필터뱅크에 기반하여 설정되는 망각 계수, 상기 제2 평활화 스펙트럼 및 상기 제2 평활화 스펙트럼의 최소값을 이용하여 잡음을 추정하는 잡음 추정 유닛을 포함한다. According to another aspect of the present invention, an apparatus for adaptive noise estimation is provided. The adaptive noise estimator is a Fourier transform unit for Fourier transforming a noisy speech signal including a speech signal and a noise signal to obtain a Fourier spectrum, and differentially filtering the Fourier spectrum to obtain a filtered Fourier spectrum. Frequency of the filtered Fourier spectrum using a filtering unit, a noise state determining unit for determining a state of the noise present in a filter bank, and a first smoothing coefficient set based on the determined filter bank. Smoothing the first smoothing spectrum in the time domain by smoothing the first smoothing spectrum in a time domain using a second smoothing coefficient set based on the first smoothing unit and the determined filter bank by smoothing in a region. Unit, and a network set up based on the determined filterbank Coefficient, comprises the second smoothing and a noise spectrum estimating unit for estimating noise using the second minimum value of the smoothed spectrum.

상기 잡음 상태 결정 유닛은, 크기 신호대 잡음(magnitude signal to noise ratio: MSNR) 및 전방향 탐색 신호대 잡음(forward search signal to noise ratio: FSSNR)에 의해 상기 잡음의 상태를 결정할 수 있다. The noise state determination unit may determine the state of the noise by magnitude signal to noise ratio (MSNR) and forward search signal to noise ratio (FSSNR).

상기 잡음 추정 유닛은, 이전 프레임에서 추정된 잡음을 더 이용하여 상기 잡음을 추정할 수 있다. The noise estimation unit may further estimate the noise using the noise estimated in the previous frame.

상기 잡음 상태 결정 유닛은, 전체 고속 푸리에 변환(fast fourier transform: FFT) 포인트를 결정하는 제1 지수(exponent)와 상기 필터뱅크의 개수를 결정하는 제2 지수의 차이로 결정되는 필터뱅크 인덱스와, 전체 필터뱅크를 저음, 중음, 고음으로 나눈 2차 필터뱅크 인덱스를 비교한 결과에 따라 상기 제1 평활화 계수, 상기 제2 평활화 계수 및 상기 망각 계수를 가변적으로 설정할 수 있다. The noise state determination unit includes a filterbank index determined by a difference between a first exponent for determining a total fast fourier transform (FFT) point and a second exponent for determining the number of the filter banks; The first smoothing coefficient, the second smoothing coefficient, and the forgetting coefficient may be variably set according to a result of comparing the second filter bank index obtained by dividing the entire filter bank into low, mid, and high pitches.

상기 잡음 상태 결정 유닛은, 상기 크기 신호대 잡음 및 상기 전방향 탐색 신호대 잡음을 임계치와 비교한 결과에 따라 상기 제1 평활화 계수, 상기 제2 평활화 계수 및 상기 망각 계수를 가변적으로 설정할 수 있다. The noise state determination unit may variably set the first smoothing coefficient, the second smoothing coefficient, and the forgetting coefficient according to a result of comparing the magnitude signal-to-noise and the omnidirectional search signal-to-noise with a threshold.

본 발명에 따르면 개선된 음성은 뮤지컬 잡음의 잔재와 음성왜곡의 인지를 효율적으로 억제할 수 있다. According to the present invention, the improved speech can effectively suppress the residual of musical noise and the perception of speech distortion.

도 1은 본 발명의 일 예에 따른 잡음 추정 장치의 동작 순서도이다.
도 2는 도 1의 잡음 추정 방법을 수행하는 잡음 추정 장치를 보여주는 블록도이다.1 is a flowchart illustrating an operation of a noise estimation apparatus according to an exemplary embodiment of the present invention.
FIG. 2 is a block diagram illustrating a noise estimation apparatus for performing the noise estimation method of FIG. 1.

이하에서는, 첨부 도면을 참조하여 본 발명의 바람직한 실시예에 대하여 상세하게 설명한다. 후술하는 실시예는 본 발명의 기술적 사상을 예시적으로 설명하기 위한 목적이므로, 본 발명의 기술적 사상은 이 실시예에 의하여 한정되는 것으로 해석되어서는 안 된다. 본 실시예에 대한 설명 및 도면에서 각각의 구성요소에 부가된 참조 부호는 단지 설명의 편의를 위하여 기재된 것일 뿐이며, 명세서 전체에 걸쳐서 동일한 참조 번호는 동일한 구성 요소를 지칭한다.Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. The following embodiments are intended to illustrate the technical concept of the present invention, and therefore the technical idea of the present invention should not be interpreted as being limited by this embodiment. In the description of the present embodiment and the drawings, reference numerals added to respective components are merely described for convenience of description, and like reference numerals refer to like elements throughout the specification.

이하에서, 푸리에 변환(Fourier Transform)에 의한 주파수 영역에서 오염된 음성이 갖는 음성과 잡음의 모델을 살펴본 후, MA 방법과 현재 프레임 또는 주파수 위치에서 잡음추정을 수행하는 WA 방법, MS 방법, 그리고 MCRA 방법이 개시된다. In the following, the speech and noise models of contaminated speech in the frequency domain by Fourier transform are examined, and then the MA method, WA method, MS method, and MCRA, which perform noise estimation at the current frame or frequency location The method is disclosed.

주파수 영역에서 깨끗한 음성신호 x(n)에 가산잡음 n(n)으로부터 오염된 음성(noisy speech) y(n)은 다음의 수학식과 같다. The noisy speech y (n) contaminated from the noise n (n) added to the clean voice signal x (n) in the frequency domain is expressed by the following equation.

수학식 1을 참조하면, n은 이산(discrete)시간 인덱스다. y(n)은 수학식 2와 같이 단구간(short-time) 푸리에 변환에 의한 푸리에 스펙트럼(Fourier Spectrum: FS) Y_i(f)로 근사화(approximate)될 수 있다. Referring to Equation 1, n is a discrete time index. y (n) may be approximated to a Fourier Spectrum (FS) Y _i (f) by a short-time Fourier transform as shown in Equation (2).

수학식 2를 참조하면, i와 f는 각각 프레임과 주파수 위치 인덱스이다. X_i(f)는 깨끗한 음성의 FS이며, N_i(f)는 잡음의 FS이다. 또한 수학식 2에 제곱을 취한 푸리에 전력 스펙트럼(Fourier Power Spectrum: FPS)

은 다음의 수학식과 같이 표현된다. Referring to Equation 2, i and f are the frame and frequency position index, respectively. X _i (f) is the FS of clean speech, and N _i (f) is the FS of noise. Also, Fourier Power Spectrum (FPS) squared with Equation 2

Is expressed as the following equation.

수학식 3을 참조하면, △θ_i는 음성과 잡음간의 위상 차(phase differences)

를 의미한다. 수학식 3에서 cos(△θ_i)가 1 또는 0을 가지면 오염된 음성의 모델은 각각 수학식 4와 수학식 5로 표현될 수 있다. Referring to Equation 3, Δθ _i is a phase difference between speech and noise.

. If cos (Δθ _i ) in Equation 3 has 1 or 0, the contaminated speech models may be represented by Equations 4 and 5, respectively.

수학식 4에서

는 푸리에 크기 스펙트럼(Fourier Magnitude Spectrum: FMS)이다. 푸리에 크기 스펙트럼이 수학식 4와 같이 유도되는 것은

이기 때문이며, 수학식 5에서 푸리에 크긴 스펙트럼이 수학식 5와 같이 유도되는 것은

이기 때문이다. In Equation 4,

Is the Fourier Magnitude Spectrum (FMS). The Fourier magnitude spectrum is derived as in Equation 4

This is because the Fourier-long spectrum in Equation 5 is derived as in Equation 5.

.

MA 방법은 VAD 기반 잡음 추정 방법에 의해 검출된 잡음 프레임 M₁개에서 제시하는 주파수 위치별

의 평균을 취해 잡음추정을 얻는 방식이며 다음의 수학식으로 표현된다. MA method frequency-position presented in more noise frame M ₁ is detected by the VAD based noise estimation method

A noise estimate is obtained by taking the average of and is expressed by the following equation.

수학식 6을 참조하면,

는 추정된 잡음의 FMS이다. 그러나 MA 방법은 비정적인 잡음에 오염된 음성으로부터 신뢰적인 VAD를 보장하기 어렵기 때문에 올바른 잡음추정을 얻기 어렵다. 또한 VAD는 잡음추정을 위한 여러 단계에서 상대적으로 높은 계산비용을 빈번히 요구하는 측면이 있다.Referring to Equation 6,

Is the FMS of the estimated noise. However, the MA method is difficult to obtain a correct noise estimate because it is difficult to guarantee a reliable VAD from speech contaminated with non-static noise. VAD also frequently requires relatively high computational cost at various stages for noise estimation.

WA 방법은 고정된 망각 계수 a₁를 가진 일차(first-order) 회귀 평균화에 의해 잡음추정을 얻는 방법이며 다음의 수학식에 의해 잡음을 추정한다. The WA method obtains a noise estimate by first-order regression averaging with a fixed forgetting coefficient a ₁ , and estimates the noise by the following equation.

여기서 α₁(0≤α₁≤1)는

을 만족하는 경우 잡음추정을 갱신하기 위함이며 일반적으로 1에 근접한 값을 사용한다. β(

)는 음성과 잡음 사이를 묵시적(implicit)으로 구분하기 위한 임계치(threshold)이다. 그러나 비정적인 잡음환경에서 WA 방법은 잡음의 변화나 성분을 고려할 수 없는 고정된 망각 계수가 사용되기에 올바른 잡음추정을 수행하지 못하는 문제가 있다. Where α ₁ (0≤α ₁ ≤1)

If is satisfied, it is used to update the noise estimate. Generally, a value close to 1 is used. β (

) Is a threshold for implicitly distinguishing between speech and noise. However, in a non-static noise environment, the WA method has a problem in that it does not perform a proper noise estimation because a fixed forgetting coefficient is used, which cannot take into account changes or components of noise.

MS 방법은 탐색 윈도우에 걸쳐있는

의 평활된 파워 스펙트럼 D_i(f)에서 추적된 최소값 D_i ^min(f)에 보상상수 cf=1.5를 곱하여 주어지는 잡음추정이다. D_i(f)와 D_i ^min(f),

는 각각 수학식 8, 9, 10과 같이 정의된다. MS method is across the navigation window

The noise estimate given by multiplying the minimum value D _i ^min (f) traced from the smoothed power spectrum D _i (f) by cf = 1.5. D _i (f) and D _i ^min (f),

Are defined as Equations 8, 9, and 10, respectively.

수학식 8 내지 10을 참조하면, α₂(0≤α₂≤1)는 D_i(f)을 얻기 위한 평활화 계수(smoothing factor)이며, M₂는 약 1.5초 동안의 과거 프레임 길이를 의미한다. MS 방법은 대체적으로 좋은 잡음추정 성능을 제시함에도 불구하고, 탐색 윈도우에 해당하는 긴 과거 프레임의 통계적 정보가 요구되는 문제가 있다. Referring to Equation 8 to _{_{10, α 2 (0≤α 2 ≤1}} ) is the smoothing factor (smoothing factor) for obtaining D _i (f), M ₂ denotes a length of the frame in the past for about 1.5 s . Although the MS method generally provides good noise estimation performance, there is a problem that statistical information of long past frames corresponding to the search window is required.

MS 방법을 변형한 MCRA에 의한 잡음추정은 신호존재 확률 p_i(f)에 의해 조정되는 망각 계수 α_3,i(f)를 가진 회귀 평균화에 의해 주어진다.

와 α_3,i(f), p_i(f)는 각각 수학식 11, 12 및 13에 의해 정의된다. The noise estimation by MCRA, which is a variation of the MS method, is given by regression averaging with the forgetting coefficient α _{3, i} (f) adjusted by the signal presence probability p _i (f).

And α _{3, i} (f) and p _i (f) are defined by equations (11), (12) and (13), respectively.

수학식 11 내지 13을 참조하면, α₄와 α₅는 각각 α_3,i(f)와 p_i(f)의 평활화 계수이다. I_i(f)는 다음의 수학식에 의해 정의된다. Referring to Equations 11 to 13, α ₄ and α ₅ are smoothing coefficients of α _{3, i} (f) and p _i (f), respectively. I _i (f) is defined by the following equation.

수학식 14를 참조하면,

는 임계치 간의 비교를 통해 잡음과 음성 사이를 연속적으로 구분하기 위한 식별자이다. D_i(f)와 D_i ^min(f)는 각각 수학식 8에서 평활된 파워 스펙트럼과 수학식 9에서 추적된 최소값이다. Referring to Equation 14,

Is an identifier for continuously distinguishing between noise and speech through a comparison between thresholds. D _i (f) and D _i ^min (f) are the power spectrum smoothed in Equation 8 and the minimum value tracked in Equation 9, respectively.

이하에서 본 발명에 따른 잡음 추정 방법에 관하여 상세히 개시한다. Hereinafter, a noise estimation method according to the present invention will be described in detail.

도 1은 본 발명의 일 예에 따른 잡음 추정 장치의 동작 순서도이다.1 is a flowchart illustrating an operation of a noise estimation apparatus according to an exemplary embodiment of the present invention.

도 1을 참조하면, 잡음 추정 장치가 노이지 음성 신호에 대한 잡음을 추정하는 방법은 입력 노이지 음성 신호에 대한 푸리에 변환 단계(Fourier Transform, S100), 미분기를 이용한 필터링 단계(Filtering by Differenciator, S105), 주파수 영역에서의 평활화 단계(Frequency Smoothing, S110), 시간 영역에서의 적응적 평활화 단계(Adaptive Time Smoothing, S115), 크기 SNR과 전방향 탐색 SNR을 구하는 단계(S120) 및 크기 SNR과 전방향 탐색 SNR을 이용한 적응적 잡음 추정 단계(S125)를 포함한다. 이하, 입력 노이지 음성 신호를 처리하여 잡음을 추정하는 본 발명의 실시예를 구성하는 각 단계에 대하여 보다 구체적으로 설명한다.Referring to FIG. 1, a method of estimating noise of a noisy speech signal by a noise estimating apparatus includes a Fourier transform (S100), an filtering step using a differentiator (S105), on an input noisy speech signal, Frequency smoothing step (Frequency Smoothing, S110), time domain adaptive smoothing step (S115), step of obtaining magnitude SNR and omnidirectional search SNR (S120) and magnitude SNR and omnidirectional SNR Adaptive noise estimation step (S125) using a. Hereinafter, each step of configuring an embodiment of the present invention for processing an input noisy speech signal and estimating noise will be described in more detail.

잡음 추정 장치는 입력되는 노이지 음성 신호 y(n)에 대한 푸리에 변환을 수행한다(S100). y(n)은 수학식 1과 같이 깨끗한 음성신호 성분 x(n)과 잡음 신호 성분 n(n)의 합으로 정의될 수 있다. 푸리에 변환은 입력 노이지 음성 신호 y(n)의 단기간(short-time) 신호에 대하여 연속적으로 수행되며, 그 결과 입력 노이지 음성 신호 y(n)는 수학식 2와 같이 푸리에 스펙트럼(Fourier Spectrum, FS) Y_i(f)으로 근사화될 수 있다. The noise estimating apparatus performs a Fourier transform on the input noisy speech signal y (n) (S100). y (n) may be defined as the sum of the clean speech signal component x (n) and the noise signal component n (n) as shown in Equation (1). Fourier transform is performed continuously on the short-time signal of the input noisy speech signal y (n), resulting in an input noisy speech signal. y (n) may be approximated to Fourier Spectrum (FS) Y _i (f) as shown in Equation 2.

잡음 추정 장치는 푸리에 스펙트럼 Y_i(f)를 미분 기반으로 필터링하여 필터링된 푸리에 스펙트럼 Y'_i(f)를 출력한다(S105). 미분 기반 필터링은 수학식 15와 같이 정의된다.The noise estimating apparatus filters the Fourier spectrum Y _i (f) based on the derivative and outputs the filtered Fourier spectrum Y ' _i (f) (S105). Differential based filtering is defined as in Equation 15.

수학식 15를 참조하면, i는 프레임 인덱스이고 ψ(f)는 주파수 대역별 신호성분을 강화(enforcement)하기 위한 가중치이다. 음성은 인체의 성문(聲門)을 통과하면서 발생하는 유성음(voiced sound)과, 성문을 통과하지 않고 발생하는 무성음(unvoiced sound)로 나뉘는데, 무성음은 유성음에 비해 낮은(lower) 에너지를 가지고, 백색(white) 잡음과 유사하여 비주기적(aperiodic) 신호이며, 전대역에 걸쳐 분포하는 특징을 가진다. 이로 인해 무성음은 유성음에 비하여 오염된 음성 신호로부터 구분해내기 어렵다. 그런데 수학식 15와 같이 미분기의 필터링을 이용하면 잡음에 오염된 음성 신호로부터 무성음의 특징을 효율적으로 구분하고 추출할 수 있다. Referring to Equation 15, i is a frame index and ψ (f) is a weight for enhancing signal components for each frequency band. Voice is divided into voiced sound that occurs while passing through the body's gates and unvoiced sound that does not pass through the gate, and the voiceless voice has a lower energy than the voiced sound. white) Similar to noise, it is an aperiodic signal and has a characteristic of being distributed over the entire band. As a result, the unvoiced sound is more difficult to distinguish from the contaminated voice signal than the voiced sound. However, using the filtering of the differentiator as shown in Equation 15, it is possible to efficiently classify and extract the characteristics of the unvoiced sound from the speech signal contaminated with noise.

잡음 추정 장치는 주파수 영역에서의 평활화를 수행하고(S110), 이로써 제1 평활화 스펙트럼 Y"_i(f)가 획득된다. 주파수 영역에서의 평활화는 다음의 수학식에 의해 정의된다. The noise estimation apparatus performs smoothing in the frequency domain (S110), whereby a first smoothing spectrum Y ″ _i (f) is obtained. Smoothing in the frequency domain is defined by the following equation.

수학식 16을 참조하면, a₁, a₂,...,a_v1은 주파수 축에서 비정적인 신호를 부드럽게 변환하는 v1차 평활화 계수이며, a₁+a₂+...+a_v1=1이다. Referring to Equation 16, a ₁ , a ₂ , ..., a _v1 are v 1st order smoothing coefficients that smoothly transform the non-static signal on the frequency axis, and a ₁ + a ₂ + ... + a _v1 = 1 to be.

주파수 영역에서의 평활화를 수행함으로써, 잡음 추정 장치는 다양한 형태를 지닌 비정적인(non-static) 잡음으로 오염된 음성으로부터 잡음을 더 정확하게 추정할 수 있고, 잘못 추정된 잡음을 이용하여 개선된 음성이 수반하는 뮤지컬 잡음이나 음성왜곡의 유발을 줄일 수 있다. By performing smoothing in the frequency domain, the noise estimator can estimate noise more accurately from speech contaminated with various forms of non-static noise, and use the incorrectly estimated noise to improve the speech. It can reduce the incidence of accompanying musical noise or voice distortion.

한편, 시간 축에서 잡음이 가변적으로 전개되는 신호에 의해 오염된 음성으로부터 잡음을 종래기술에 의해 추정하는 경우, i)잡음이 증가하는 영역에서 실제잡음보다 낮게 평가된 추정으로 인해 잔재잡음이 유발되고, ii) 잡음이 감소하는 영역에서 실제 잡음보다 높게 평가된 추정으로 인해 음성왜곡이 유발된다. 따라서, 잡음 추정 장치는 신뢰성있는 잡음 추정을 수행하기 위해, 수학식 8과 같이 시간 축으로 고정된 평활계수를 이용할 수도 있다. 그러나, 평활계수의 근사치에 따른 음성 특성의 감쇄나 잡음 변화의 비정적 유지의 문제는 여전히 남는다. On the other hand, when noise is estimated by the prior art from speech contaminated by a signal in which the noise is variably developed on the time axis, i) residual noise is caused by the estimated evaluation lower than actual noise in the area where the noise increases. , ii) speech distortion is caused by the estimated estimates higher than actual noise in areas where noise is reduced. Therefore, the noise estimating apparatus may use a smoothing coefficient fixed on the time axis as shown in Equation 8 to perform reliable noise estimation. However, the problem of attenuation of speech characteristics and non-static maintenance of noise change according to an approximation of the smoothing coefficient remains.

따라서 잡음 추정 장치는 전체 고속 푸리에 변환(Fast Fourier Transform: FFT) 포인트로부터 나뉜 몇 개의 필터뱅크(filter bank) 단위로 계산된 적응적 평활계수를 이용하여, 제1 평활화 스펙트럼 Y"_i(f)에 대해 시간 영역에서의 적응적 평활화를 수행한다(S115). 일 예로서, 잡음 추정 장치는 수학식 17과 같은 시간영역에서의 적응적 평활계수 b₁(j)를 이용하여 제2 평활화 스펙트럼 S_i _,j(k)를 추출한다. Therefore, the noise estimator uses an adaptive smoothing coefficient calculated in units of several filter banks divided from the entire Fast Fourier Transform (FFT) points, thereby providing the first smoothing spectrum Y " _i (f). Adaptive smoothing in the time domain is performed in operation S 115. As an example, the noise estimation apparatus performs a second smoothing spectrum S _i using the adaptive smoothing coefficient b ₁ (j) in the time domain as shown in Equation 17. _{, j} (k) is extracted.

수학식 17을 참조하면, j(0≤j≤2^P-p-1)은 전체 FFT 포인트 2^P에서 2^p로 나뉜 필터뱅크 2^P-p 인덱스이고, 대문자 P는 FFT 포인트를 결정하기 위한 지수(exponent)이며, 소문자 p는 필터뱅크 수를 결정하기 위한 지수이고, k(0≤k≤j·2^p-1)은 필터뱅크내에 존재하는 스펙트럼 빈(bin) 인덱스이며, b₁(j), b₂(j),...,b_v2(j)는 v2차 적응적 평활계수이다. 여기서, b₁(j)+b₂(j)+...+b_v2(j)=1이다. b₁(j), b₂(j),...,b_v2(j)는 후술될 표 1에 의해 설정된다.Referring to Equation 17, j (0 ≦ j ≦ 2 ^Pp −1) is the filterbank 2 ^Pp index divided by 2 ^p from the total FFT points 2 ^P , and the uppercase P is an exponent for determining the FFT point. , Lowercase p is an index for determining the number of filter banks, k (0 ≦ k ≦ j · 2 ^p −1) is a spectral bin index existing in the filter bank, b ₁ (j), b ₂ ( j), ..., b _v2 (j) is the v2 order adaptive smoothing coefficient. Here, b ₁ (j) + b ₂ (j) + ... + b _v2 (j) = 1. b ₁ (j), b ₂ (j), ..., b _v2 (j) are set by Table 1 to be described later.

음성 유사 필터뱅크에서 평활계수 b₁(j)에 상대적으로 낮은 가중치가 할당되는 경우, 음성의 특성을 효과적으로 나타낼 수 있다. 잡음 유사 필터뱅크에서 평활계수 b₁(j)에 상대적으로 높은 가중치가 할당되는 경우, 잡음의 특성을 효과적으로 억제할 수 있다. 이에 따르면, 가변적인 잡음 변화에 대해 안정적으로 변환하면서 음성의 특성을 유지할 수 있다. 이로써 잡음의 변화가 고려되지 않은 고정된 평활계수를 적용하는 기존 방식이 가변적인 잡음의 효과를 효율적으로 줄이지 못하는 문제와, 1에 근접하는 평활계수를 적용하는 경우 잡음의 변화가 안정적으로 변하지만 음성의 특성이 감쇠하는 문제, 그리고 0에 근접하는 평활계수를 적용하는 경우 음성의 특성은 그대로 보존되지만 잡음의 변화는 비정적으로 유지되는 문제가 해결될 수 있다. When a low weight is assigned to the smoothing coefficient b ₁ (j) in the speech-like filter bank, the characteristics of the speech can be effectively represented. When a high weight is assigned to the smoothing coefficient b ₁ (j) in the noise-like filter bank, noise characteristics can be effectively suppressed. According to this, it is possible to maintain the characteristics of the voice while stably converting to the variable noise change. As a result, the conventional method of applying a fixed smoothing coefficient without considering the change of noise does not effectively reduce the effect of variable noise, and when the smoothing coefficient close to 1 is applied, the change in noise is stable. The problem of attenuation of and the smoothing coefficient approaching zero can be solved.

잡음 추정 장치는 필터뱅크에 존재하는 잡음의 상태를 결정한다(S120). 예를 들어 잡음의 상태 결정은, 잡음의 상태를 나타내는 크기 SNR 또는 전방향 탐색 SNR에 의해 이루어질 수 있다. 먼저 크기 SNR은 수학식 18에 의해 구해질 수 있다. The noise estimating apparatus determines a state of noise existing in the filter bank (S120). For example, the determination of the state of the noise may be made by magnitude SNR or omnidirectional SNR indicating the state of the noise. First, the magnitude SNR may be obtained by Equation 18.

또한, 전방향 탐색 SNR은 수학식 19에 의해 구해질 수 있다.In addition, the omnidirectional SNR may be calculated by Equation 19.

수학식 18과 수학식 19를 참조하면, FBS는 필터뱅크 사이즈이고,

는 이전 프레임에서 추정된 잡음이다. γ_i(j)(또는 φ_i(j))가 1에 가까우면 해당 필터뱅크는 잡음 유사 필터뱅크를 나타내고, φ_i(j)(또는 γ_i(j))가 0에 가까우면 해당 필터뱅크는 음성 유사 필터뱅크를 나타낸다. 이와 같이 잡음 추정 장치는 γ_i(j) 또는 φ_i(j)의 크기에 기반한 필터뱅크의 판단 기준에 따라 필터뱅크의 잡음 상태를 결정할 수 있다. Referring to Equation 18 and Equation 19, FBS is the filter bank size,

Is the noise estimated in the previous frame. If γ _i (j) (or φ _i (j)) is close to 1, the corresponding filter bank represents a noise-like filter bank, and if φ _i (j) (or γ _i (j)) is close to 0, the corresponding filter bank Denotes a negative like filterbank. As such, the noise estimating apparatus may determine the noise state of the filter bank based on a criterion of the filter bank based on the size of γ _i (j) or φ _i (j).

한편, T_i _,j(k)는 전방향 탐색 스펙트럼이다. 전방향 탐색은 한 프레임 전체 또는 한 프레임에서 나뉜 여러 서브밴드 단위로 제2 평활화 스펙트럼 S_i _,j(k)에 존재하는 잡음의 상태를 예측하기 위한 것이다. T_i _,j(k)는 다음의 수학식 20에 의해 정의된다. On the other hand, T _i _{, j} (k) is the omnidirectional search spectrum. The omni-directional search is for estimating the state of noise present in the second smoothed spectrum _Si _{, j} (k) in the entire frame or in several subbands divided in one frame. T _i _{, j} (k) is defined by the following equation (20).

수학식 20을 참조하면, S^min _i _,j(k)는 일정한 과거 탐색 윈도우의 S_i _,j(k)로부터 검출된 최소 스펙트럼이고, c₁(j), c₂(j),...,c_v2(j)는 v2차 적응적 망각 계수이다. 여기서, c₁(j)+c₂(j)+...+c_v2(j)=1이다. 잡음 추정 장치는, 음성 유사 필터뱅크에서 적응적 망각계수 c₁(j)에 상대적으로 낮은 가중치를 할당하여 전방향 탐색 스펙트럼이 잡음 유사 스펙트럼을 지니게 할 수 있다. 그리고 잡음 추정 장치는, 잡음 유사 필터뱅크에서 적응적 망각계수 c₁(j)에 상대적으로 높은 가중치를 할당하여 전방향 탐색 스펙트럼이 잡음 유사 스펙트럼을 지니게 할 수 있다.(보내주신 제안서에서 이와 같이 기재되어 있는데, 반대로 되어야 하는 것이 아닌지 확인부탁드립니다.) c₁(j), c₂(j),...,c_v2(j)와 망각계수 d(j)는 후술될 표 1의 알고리즘에 의해 설정된다. Referring to Equation ^{_{_{20, S min i, j (}}} k) is the minimum spectrum detected from the S _{_i, j} (k) of constant previous search _{window, c 1 (j), c} 2 (j), ... , c _v2 (j) is the v2 order adaptive forgetting factor. Here, c ₁ (j) + c ₂ (j) + ... + c _v2 (j) = 1. The noise estimation apparatus may assign a relatively low weight to the adaptive forgetting factor c ₁ (j) in the speech like filter bank so that the omnidirectional search spectrum has a noise like spectrum. In addition, the noise estimation apparatus may assign a relatively high weight to the adaptive forgetting factor c ₁ (j) in the noise like filter bank so that the omnidirectional search spectrum has the noise like spectrum (as described in the proposed proposal). C ₁ (j), c ₂ (j), ..., c _v2 (j) and the forgetting factor d (j) are described by the algorithm of Table 1 to be described later. Is set.

표 1을 참조하면, FB2(m)은 전체 필터뱅크로부터 사용자 임의의 저음, 중금, 고음으로 나눈 2차 필터뱅크 인덱스이고, M은 2차 필터뱅크의 전체 개수이며, TH^FB2(m) _φ(l)은 φ_i(j)의 임계치이며, TH^FB2 ^(m) _γ(l)은 γ_i(j)의 임계치이다. 그리고 L은 TH^FB2(m) _φ(l)과 TH^FB2 ^(m) _γ(l)에 따라 사용자에 의해 나뉜 전체 스텝이고, W^FB2(m) _b1(l),...,W^FB2(m) _bv3(l)은 b₁(j),b₂(j),...,b_v2(j)의 가중치이며, W^FB2(m) _c1(l),...,W^FB2(m) _cv3(l)은 c₁(j),c₂(j),...,c_v2(j)의 가중치이고, W^FB2 ^(m) _d(l)은 d(j)의 가중치이다. Referring to Table 1, FB2 (m) is the secondary filterbank index divided by the user's arbitrary bass, midtone, and treble from the total filterbank, M is the total number of secondary filterbanks, and TH ^{FB2 (m)} _φ ( l) is the threshold of φ _i (j) and TH ^FB2 ^(m) _γ (l) is the threshold of γ _i (j). And L is the total step divided by the user according to TH ^{FB2 (m)} _φ (l) and TH ^FB2 ^(m) _γ (l), W ^{FB2 (m)} _b1 (l), ..., W ^{FB2 (m )} _bv3 (l) is the _weight of b ₁ (j), b ₂ (j), ..., b _v2 (j), W ^{FB2 (m)} _c1 (l), ..., W ^{FB2 (m)} _cv3 (l) is the _weight of c ₁ (j), c ₂ (j), ..., c _v2 (j), and W ^FB2 ^(m) _d (l) is the weight of d (j).

잡음 추정 장치는 상기 표 1의 알고리즘에 의해 계산된 d(j)와 최소값 S^min _i,j(k)를 이용하여 추정 잡음

을 아래의 수학식 21에 의해 구한다(S125). The noise estimation apparatus estimates noise using d (j) and the minimum value S ^min _{i, j} (k) calculated by the algorithm of Table 1 above.

Is obtained by the following equation (21) (S125).

이상에서 상세하게 설명한 바와 같이, 본 발명에 따른 잡음 추정 방법에서는 잡음의 변화에 상관없이 프레임별로 고정된 망각 요소를 적용하는 기존의 WA 기법 대신에, 적응적 망각 계수를 적용하여 잡음을 추정한다. 이로써, VAD에 의해 얻어진 여러 잡음 프레임에서 제시하는 통계적 정보를 이용하는 종래기술과 달리, 잡음환경이 다양하게 비정적이거나, 음성 에너지가 약한 구간 또는 낮은 SNR에서도 올바른 VAD를 얻을 수 있고, 음성영역에서 잡음추정이 적응적으로 수행할 수 있어 신뢰성있는 잡음 추정이 가능하다. 또한, 본 실시예는 잡음 추정에 있어서 상대적으로 계산량이 적을 뿐만 아니라 요구되는 메모리의 용량도 크기 않기 때문에, 실제 하드웨어나 소프트웨어로 구현하기가 용이하다. As described in detail above, the noise estimation method according to the present invention estimates noise by applying an adaptive forgetfulness coefficient instead of the conventional WA technique of applying a fixed forgetting element for each frame regardless of noise change. As a result, unlike the conventional technology using statistical information presented by various noise frames obtained by VAD, a correct VAD can be obtained even in a low noise environment or a low SNR where the noise environment is variously non-static, or the voice region is low. Estimation can be performed adaptively, resulting in reliable noise estimation. In addition, the present embodiment is relatively easy to implement in real hardware or software because the calculation amount is relatively small and the amount of memory required is not large.

도 2는 도 1의 잡음 추정 방법을 수행하는 잡음 추정 장치를 보여주는 블록도이다. 도 2를 참조하면, 잡음 추정 장치(200)는 입력 노이지 음성 신호에 대한 푸리에 변환 유닛(205), 필터링 유닛(210), 제1 평활화 유닛(215), 제2 평활화 유닛(220), 잡음 상태 결정 유닛(225) 및 잡음 추정 유닛(230)을 포함한다. 본 발명의 실시예에 따른 잡음 추정 장치(200)에 포함되는 각 구성 요소(205, 210, 215, 220, 225 및 230)의 기능은 전술한 본 발명의 도 1의 실시예에 따른 잡음 추정 절차를 구성하는 단계(S100, S105, S110, S115, S120 및 S125)에서 설명한 것이 동일하게 적용될 수 있으므로, 이하 이에 대한 구체적인 설명은 생략한다. 이러한 본 발명의 실시예에 따른 잡음 추정 장치(200)는 스피커폰이나 영상 통화용 통신 기기, 보청기, 블루투스 기기 등과 같은 음성 기반 어플리케이션 장치 또는 음성인식 시스템 등에 구비되어, 입력 노이지 음성 신호로부터 잡음의 상태를 판별하고, 또한 이를 이용하여 잡음 추정, 음질 개선, 및/또는 음성 인식을 하는데 이용될 수 있다.FIG. 2 is a block diagram illustrating a noise estimation apparatus for performing the noise estimation method of FIG. 1. Referring to FIG. 2, the noise estimation apparatus 200 includes a Fourier transform unit 205, a filtering unit 210, a first smoothing unit 215, a second smoothing unit 220, and a noise state for an input noisy speech signal. Determination unit 225 and noise estimation unit 230. Functions of the respective components 205, 210, 215, 220, 225, and 230 included in the noise estimation apparatus 200 according to the embodiment of the present invention are the noise estimation procedure according to the embodiment of FIG. 1 of the present invention described above. Since the descriptions in the steps S100, S105, S110, S115, S120, and S125 may be equally applicable, detailed descriptions thereof will be omitted below. The noise estimating apparatus 200 according to the exemplary embodiment of the present invention is provided in a voice-based application device or a voice recognition system such as a speakerphone or a video call communication device, a hearing aid, a Bluetooth device, and the like, to determine a state of noise from an input noisy voice signal. And can also be used to estimate noise, improve sound quality, and / or use speech recognition.

이상의 설명은 본 발명의 기술 사상을 예시적으로 설명한 것에 불과한 것으로서, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자라면 본 발명의 본질적인 특성에서 벗어나지 않는 범위에서 다양한 수정 및 변형이 가능할 것이다. 따라서, 본 발명에 개시된 실시 예들은 본 발명의 기술 사상을 한정하기 위한 것이 아니라 설명하기 위한 것이고, 이러한 실시 예에 의하여 본 발명의 기술 사상의 범위가 한정되는 것은 아니다. 본 발명의 보호 범위는 아래의 청구범위에 의하여 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술 사상은 본 발명의 권리범위에 포함되는 것으로 해석되어야 할 것이다.
The above description is merely illustrative of the technical idea of the present invention, and those skilled in the art to which the present invention pertains may make various modifications and variations without departing from the essential characteristics of the present invention. Therefore, the embodiments disclosed in the present invention are intended to illustrate rather than limit the scope of the present invention, and the scope of the technical idea of the present invention is not limited by these embodiments. The scope of protection of the present invention should be construed according to the following claims, and all technical ideas falling within the scope of the same shall be construed as falling within the scope of the present invention.

Claims

Fourier transforming the noisy speech signal including the speech signal and the noise signal to obtain a Fourier spectrum;
Differentially filtering the Fourier spectrum to obtain a filtered Fourier spectrum;
Determining a state of the noise present in a filter bank;
Obtaining a first smoothing spectrum by smoothing the filtered Fourier spectrum in a frequency domain using a first smoothing coefficient set based on the determined filter bank;
Obtaining a second smoothing spectrum by smoothing the first smoothing spectrum in a time domain by using a second smoothing coefficient set based on the determined filter bank; And
Estimating noise using a forgetting coefficient set based on the determined filter bank, a minimum value of the second smoothing spectrum, and the second smoothing spectrum,
And the state of the noise is determined by magnitude signal to noise ratio (MSNR) and forward search signal to noise ratio (FSSNR).

delete

The method of claim 1, wherein estimating the noise comprises:
And estimating the noise further using the noise estimated in the previous frame.

The method of claim 1,
And the minimum value of the second smoothed spectrum is a minimum spectrum detected from a constant past search window.

The method of claim 1, wherein the first smoothing coefficient, the second smoothing coefficient and the forgetting coefficient,
A filter bank index determined by a difference between a first exponent for determining an overall fast fourier transform (FFT) point and a second exponent for determining the number of the filter banks, and a bass and midtone for the entire filter bank. The noise estimation method, characterized in that the variable is set in accordance with the result of comparing the filter bank index divided by the treble.

The method of claim 1, wherein the first smoothing coefficient, the second smoothing coefficient and the forgetting coefficient,
And variably set according to a result of comparing the magnitude signal-to-noise and the omnidirectional search signal-to-noise with a threshold.

A Fourier transform unit for Fourier transforming a noisy speech signal including a speech signal and a noise signal to obtain a Fourier spectrum;
A filtering unit for differentially filtering the Fourier spectrum to obtain a filtered Fourier spectrum;
A noise state determination unit for determining a state of the noise present in a filter bank;
A first smoothing unit for smoothing the filtered Fourier spectrum in a frequency domain by using a first smoothing coefficient set based on the determined filter bank;
A second smoothing unit for smoothing the first smoothing spectrum in a time domain by using a second smoothing coefficient set based on the determined filter bank; And
And a noise estimating unit estimating noise by using the forgetting coefficient set based on the determined filter bank, the second smoothing spectrum, and the minimum value of the second smoothing spectrum.
And the noise state determination unit determines the state of the noise by magnitude signal to noise ratio (MSNR) and forward search signal to noise ratio (FSSNR). Estimation device.

delete

The method of claim 7, wherein the noise estimation unit,
And estimating the noise further using the noise estimated in the previous frame.

The method of claim 7, wherein
And the minimum value of the second smoothed spectrum is a minimum spectrum detected from a constant past search window.

The method of claim 7, wherein the noise state determination unit,
A filter bank index determined by a difference between a first exponent for determining an overall fast fourier transform (FFT) point and a second exponent for determining the number of the filter banks, and a bass and midtone for the entire filter bank. And variably setting the first smoothing coefficient, the second smoothing coefficient, and the forgetting coefficient according to a result of comparing the second filter bank index divided by the treble.

The method of claim 7, wherein the noise state determination unit,
And variably setting the first smoothing coefficient, the second smoothing coefficient, and the forgetting coefficient according to a result of comparing the magnitude signal-to-noise and the omnidirectional search signal-to-noise with a threshold.