KR20030078218A

KR20030078218A - Noise suppression method and apparatus

Info

Publication number: KR20030078218A
Application number: KR1020020017106A
Authority: KR
Inventors: 최승호; 최창규; 김상룡
Original assignee: 삼성전자주식회사
Priority date: 2002-03-28
Filing date: 2002-03-28
Publication date: 2003-10-08
Also published as: KR100446626B1

Abstract

본 발명은 음성신호에서 잡음을 제거하는 방법 및 장치에 관한 것이다.The present invention relates to a method and apparatus for removing noise from a voice signal.

본 발명에 따른 잡음 제거 방법은, 독립성분분석법을 이용하여 두 개 이상의 음성과 잡음이 섞인 혼합신호로부터 음성신호와 잡음신호를 분리하는 단계와, 상기 분리된 신호중 음성신호를 선택하는 단계와, 스펙트럼 차감법을 이용하여 상기 선택된 음성신호에서 잔여 잡음을 제거하는 단계를 포함한다.In accordance with another aspect of the present invention, there is provided a noise removing method comprising: separating a voice signal and a noise signal from a mixed signal of two or more voices and noises using an independent component analysis method, selecting a voice signal from the separated signals, and spectrum Removing residual noise from the selected speech signal using a subtraction method.

이와 같은 본 발명에 의하면, 주변 소음을 음성인식의 전처리 단계에서 효과적으로 제거함으로써 음성인식의 결과를 향상시킬 수 있다.According to the present invention as described above, it is possible to improve the results of speech recognition by effectively removing the ambient noise in the pre-processing step of speech recognition.

Description

Noise suppression method and apparatus for removing noise from speech signal

본 발명은 음성신호에서 잡음을 제거하기 위한 방법 및 장치에 관한 것으로, 좀더 구체적으로는 음성인식의 전처리단계에서 음성신호로부터 잡음을 제거하기 위한 방법 및 장치에 관한 것이다.The present invention relates to a method and apparatus for removing noise from a speech signal, and more particularly, to a method and apparatus for removing noise from a speech signal in a preprocessing step of speech recognition.

음성인식기는 음성신호에 주변소음이 혼합될 경우 인식률이 급격한 저하할 수 있다. 이것은 음성의 모델을 얻기 위한 음성 데이터베이스와 인식시의 입력 데이터와의 불일치에 주로 기인한다. 이를 극복하고자 음성신호와 잡음이 혼합된 경우, 잡음이 제거된 원래의 음성신호를 얻기 위한 연구는 1990년대 이후로 활발히 진행되어왔다.The speech recognizer may suddenly decrease the recognition rate when the ambient noise is mixed with the speech signal. This is mainly due to the mismatch between the speech database for obtaining the speech model and the input data at the time of recognition. In order to overcome this problem, in the case of mixing the voice signal and the noise, the research for obtaining the original voice signal from which the noise is removed has been actively conducted since the 1990s.

이와 같은 연구들로는 스펙트럼 영역에서 잡음의 스펙트럼을 차감하는 스펙트럼 차감 방식(Spectrum Subtraction)들과 2개 이상의 마이크들을 사용하여 원신호들을 분리해내는 독립성분분석 방식(Independent Component Analysis; ICA)들이 주류를 이룬다. 상기 스펙트럼 차감 방법은 미국특허 제6,289,309호, 미국특허 제5,943,429호, 미국특허 제5,839,101호, 미국특허 제5,687,243호, 한국공개특허 출원번호 1999-36115에 개시되어 있으며, 독립성분분석방법(ICA)을 사용하여 혼합된 신호들을 원신호(소오스 신호)들로 분리하는 블라인드 신호 분리(blind signal separation:BSS)에 관한 방법들은 미국특허 제5,999,567호, 미국특허 제5,706,402호, 미국특허 제5,675,659호에 개시되어 있다. 그리고, 이와 같은 독립성분분석방법과 스펙트럼 차감법은 각각 별개로 사용되어 왔다.Such studies are the mainstream of Spectrum Subtraction, which subtracts the spectrum of noise in the spectral domain, and Independent Component Analysis (ICA), which separates original signals using two or more microphones. . The spectral subtraction method is disclosed in U.S. Patent No. 6,289,309, U.S. Patent No. 5,943,429, U.S. Patent No. 5,839,101, U.S. Patent No. 5,687,243, and Korean Patent Application Publication No. 1999-36115. Methods for blind signal separation (BSS) for separating mixed signals into original signals (source signals) using the methods disclosed in US Pat. Nos. 5,999,567, 5,706,402 and 5,675,659. have. Independent component analysis and spectral subtraction have been used separately.

그러나, 독립성분분석 방법(ICA 기법) 등의 신호분리 방식은 통계적인 신호처리 방식에 근거하여 데이터의 양이나 프로세서의 처리속도에 따라 분리성능이 기대에 못 미치는 경우가 많다. 그리고, 스펙트럼 차감법의 경우, 신호대잡음비(Signal Noise Ratio; SNR)가 작을 경우 잡음의 구간이나 잡음의 스펙트럼을 정확히 추정하기가 어려우며, 이로 인해 원음성의 왜곡을 초래한다.However, the signal separation method such as the independent component analysis method (ICA) is often less than expected depending on the amount of data or the processing speed of the processor based on the statistical signal processing method. In the case of the spectral subtraction method, when the signal noise ratio (SNR) is small, it is difficult to accurately estimate the interval of the noise or the spectrum of the noise, which causes distortion of the original sound.

본 발명은 상기와 같은 문제점을 해결하고자 음성과 잡음이 혼합된 신호로부터 잡음이 감쇄된 음성신호를 얻기 위하여 독립성분분석방법과 스펙트럼 차감법을 직렬적으로 사용함으로써, 잡음의 감쇄 기능의 향상을 얻기 위해 음성신호에서 잡음을 제거하기 위한 방법 및 장치를 제공하는 것을 목적으로 한다.In order to solve the above problems, the independent component analysis method and the spectral subtraction method are used in series to obtain a noise-attenuated speech signal from a mixture of speech and noise, thereby improving the noise reduction function. It is an object of the present invention to provide a method and apparatus for removing noise from a voice signal.

도 1은 본 발명에 따라 음성인식을 위한 전처리 장치의 일 예를 도시하는 블럭도,1 is a block diagram showing an example of a preprocessing apparatus for speech recognition according to the present invention;

도 2는 본 발명에서 이용하는 독립성분분석 방법을 설명하기 위한 도면,2 is a view for explaining the independent component analysis method used in the present invention,

도 3은 본 발명에서 이용하는 스펙트럼 차감 방법을 설명하기 위한 도면,3 is a view for explaining a spectrum subtraction method used in the present invention;

도 4의 (a)는 원래의 혼합신호에 대한 음성인식 결과를 나타내는 도면,4 (a) is a view showing a voice recognition result for the original mixed signal,

도 4의 (b)는 스펙트럼 차감 방법을 적용한 음성인식 결과를 나타내는 도면,Figure 4 (b) is a view showing the result of speech recognition applying the spectrum subtraction method,

도 4의 (c)는 독립성분분석 방법을 적용한 음성인식 결과를 나타내는 도면,Figure 4 (c) is a view showing the result of speech recognition applying the independent component analysis method,

도 4의 (d)는 독립성분분석 방법과 스펙트럼 차감 방법을 모두 적용한 음성인식 결과를 나타내는 도면,4 (d) is a view showing a result of speech recognition applying both independent component analysis method and spectrum subtraction method,

도 5는 도 4에 도시된 4가지 경우의 음성인식결과를 수치로 표현한 표.FIG. 5 is a table representing numerical results of speech recognition results in four cases illustrated in FIG. 4.

* 도면의 주요한 부분에 대한 부호의 설명 *Explanation of symbols on the main parts of the drawings

100 : 음성인식 전처리장치110: 독립성분분석부100: speech recognition preprocessor 110: independent component analysis unit

120 : 채널선택부130 : 스펙트럼차감부120: channel selector 130: spectrum subtractor

상기와 같은 과제를 해결하기 위한 본 발명의 하나의 특징은, 음성신호에서 잡음을 제거하는 방법에 있어서, 독립성분분석법을 이용하여 두 개 이상의 음성과 잡음이 섞인 혼합신호로부터 음성신호와 잡음신호를 분리하는 단계와, 상기 분리된 신호중 음성신호를 선택하는 단계와, 스펙트럼 차감법을 이용하여 상기 선택된 음성신호에서 잔여 잡음을 제거하는 단계를 포함하는 것이다.One feature of the present invention for solving the above problems is, in a method for removing noise from a voice signal, by using an independent component analysis method to separate the voice signal and noise signal from a mixed signal of two or more voice and noise And a step of selecting a voice signal among the separated signals, and removing residual noise from the selected voice signal using a spectral subtraction method.

본 발명의 다른 특징은, 상기 잡음 제거 방법을 포함하는 음성인식방법에 관한 것이다.Another aspect of the present invention relates to a speech recognition method including the noise removing method.

본 발명의 또다른 특징은, 음성신호에서 잡음을 제거하는 장치에 있어서, 독립성분분석법을 이용하여 두 개 이상의 음성과 잡음이 섞인 혼합신호로부터 음성신호와 잡음신호를 분리하는 신호분리부와, 상기 신호분리부에 의해 분리된 신호중음성신호를 선택하는 신호선택부와, 스펙트럼 차감법을 이용하여 상기 신호선택부에서 선택된 음성신호에서 잔여 잡음을 제거하는 잡음 제거부를 포함하는 것이다.In still another aspect of the present invention, there is provided a device for removing noise from a voice signal, comprising: a signal separator for separating a voice signal and a noise signal from a mixed signal of two or more voices and noises using an independent component analysis method; The signal selector selects the signal-neutral voice signal separated by the signal splitter, and a noise remover removes residual noise from the voice signal selected by the signal selector by using the spectral subtraction method.

본 발명의 또다른 특징은, 상기 잡음 제거 장치를 포함하는 음성인식장치에 관한 것이다.Another feature of the present invention relates to a voice recognition device including the noise canceling device.

이하에서는, 첨부된 도 1 내지 도 5를 참조하여 본 발명을 상세히 설명한다.Hereinafter, with reference to the accompanying Figures 1 to 5 will be described in detail the present invention.

도 1에 본 발명에 따른 음성인식장치의 일 예가 도시되어 있다. 상기 음성인식장치는 음성인식 전처리 장치(100)와 음성인식부(140)를 포함한다. 음성인식 전처리 장치(100)는 독립성분분석부(110)와, 채널선택부(120)와, 스펙트럼차감부(130)를 포함한다.1 shows an example of a voice recognition device according to the present invention. The speech recognition apparatus includes a speech recognition preprocessor 100 and a speech recognition unit 140. The speech recognition preprocessor 100 includes an independent component analyzer 110, a channel selector 120, and a spectrum subtractor 130.

독립성분분석부(110)는 ICA 기법을 사용하여 혼합된 신호들을 원신호들로 분리한다. 즉, N ( > 1) 개의 채널(마이크)을 통해 입력된 혼합신호들에서 음성신호와 잡음이 분리된 신호들을 얻는다.The independent component analyzer 110 separates the mixed signals into the original signals using the ICA technique. In other words, speech and noise signals are separated from the mixed signals input through N (> 1) channels (microphones).

채널선택부(120)는 독립성분분석부(110)에서 출력된 분리된 신호들에서 음성신호로 판단되는 채널의 음성신호를 선택한다. 채널선택부를 구현하는 방법은, 예를 들어, 잡음 신호는 음성신호의 크기보다 보통 작으므로, 분리된 신호들중에서 크기가 큰 신호를 선택하게 함으로써 간단하게 구현할 수 있다. 잡음신호와 음성신호의 크기가 비슷한 경우에는 스펙트럼차감법 등의 잡음감쇄 알고리즘을 재차 적용하여 출력된 채널 출력 중에서 전체 신호의 에너지가 큰 채널을 선택하는 방법을 도입할 수 있다. 다만, 이와 같은 방법에 제한되는 것은 아니며 분리된 신호들중에서 희망하는 신호를 선택하게 하는 어떠한 수단에 의해서도 가능할 것이다.The channel selector 120 selects a voice signal of a channel determined as a voice signal from the separated signals output from the independent component analyzer 110. The method of implementing the channel selector is, for example, since the noise signal is usually smaller than the size of the voice signal, it can be easily implemented by selecting a larger signal from the separated signals. When the noise signal and the voice signal have similar magnitudes, a method of selecting a channel having a large energy of the entire signal from the output channel output by applying a noise reduction algorithm such as spectral subtraction method again. However, the present invention is not limited to this method and may be possible by any means for selecting a desired signal from the separated signals.

스펙트럼 차감부(130)는 채널선택부(120)에서 선택된 채널에서 잔여 잡음을 감쇄하기 위해 스펙트럼 차감법을 적용한다.The spectrum subtractor 130 applies a spectral subtraction method to attenuate the residual noise in the channel selected by the channel selector 120.

이와 같이 음성신호와 잡음신호가 혼합된 신호로부터 독립성분분석방법에 의해 음성신호와 잡음신호를 분리하면, 분리된 음성신호에서의 신호대잡음비는 커지게 되고, 이와 같이 신호대잡음비가 커진 음성신호를 스펙트럼 차감법을 적용함으로써, 스펙트럼 차감부에서는 보다 효과적으로 음성신호에서 잔여 잡음을 제거할 수 있게 된다.When the voice signal and the noise signal are separated from the mixed signal of the voice signal and the noise signal by the independent component analysis method, the signal-to-noise ratio of the separated voice signal becomes large, and thus the voice signal having the high signal-to-noise ratio spectrum By applying the subtraction method, the spectral subtraction unit can more effectively remove residual noise from the voice signal.

그리고, 이와 같이 스펙트럼 차감부(130)로부터 출력된 잡음이 감쇄된 음성신호는 음성인식을 위한 입력으로 사용될 수도 있고, 그대로 출력되어 다른 응용을 위한 입력신호가 될 수 있다.In addition, the voice signal from which the noise output from the spectrum subtractor 130 is attenuated may be used as an input for voice recognition, or may be output as it is to be an input signal for another application.

이제, 도 2를 참조하여 본 발명에서 이용되는 독립성분분석방법을 설명한다. 즉, 독립성분분석방법은 원신호들이 상호 독립이라는 가정하에 신경망 학습을 통해 혼합된 신호로부터 원신호를 추출하는 방법이다.Now, the independent component analysis method used in the present invention will be described with reference to FIG. In other words, the independent component analysis method extracts the original signal from the mixed signal through neural network learning under the assumption that the original signals are independent of each other.

도 2의 (a)에 도시된 바와 같이 독립성분인 블라인드 신호,는 혼합되어 마이크로는 혼합된 신호,,가 입력된다. 즉, (b)에 도시된 식과 같이 마이크로 입력되는 신호는 블라인드 신호에 소정의 혼합 행렬 A를 곱한 신호라고 가정한다. 이와 같이 혼합된 신호,,는 (c)에 도시된 바와 같이, W 라는 행렬에 의해 독립성분으로 분리된다. 즉, 블라인드 신호를 W라는 행렬에 의해 추정하는것이다(d).Independent blind signal as shown in FIG. , Is mixed micro is mixed signal , , Is input. That is, the signal input to the microphone as shown in the formula (b) Signal blind Assume that is a signal multiplied by a predetermined mixing matrix A. Mixed signals like this , , Are separated into independent components by the matrix W, as shown in (c). Ie blind signal Is estimated by the matrix W (d).

다음, 본 발명에서 이용되는 스펙트럼 차감법을 설명한다. 스펙트럼 차감법에서는 음성과 잡음의 혼합신호인 입력신호가 서로 다른 주파수 대역으로 분석되고, 각 대역에 대응하는 신호에 이득이 인가되며, 출력신호 생성을 위해 이득 인가된 신호는 결합된다. 각 대역에 인가된 이득은 시변 특성이 있고, 배경 잡음 레벨, 희망하는 억제의 양, 그 대역에서의 신호대잡음비에 따라 다르다.Next, the spectral subtraction method used in the present invention will be described. In the spectral subtraction method, an input signal, which is a mixed signal of speech and noise, is analyzed in different frequency bands, a gain is applied to a signal corresponding to each band, and a gain applied signal is combined to generate an output signal. The gain applied to each band is time-varying and depends on the background noise level, the amount of suppression desired, and the signal-to-noise ratio in that band.

이와같은 스펙트럼 차감법은 다양한 형태로 구현될 수 있으며, 도 3에 본 발명에서 이용하는 스펙트럼 차감법의 일 예를 도시한다.Such a spectral subtraction method can be implemented in various forms, and FIG. 3 shows an example of the spectral subtraction method used in the present invention.

도 3에 도시된 스펙트럼 차감부(300)는 현재 SNR 추정부(310)와, SAP(320)와, SNR 수정부(330)와, 이득 계산부(340)와, 잡음/음성 스펙트럼 갱신부(350)를 포함한다.The spectrum subtractor 300 shown in FIG. 3 includes a current SNR estimator 310, an SAP 320, an SNR correction unit 330, a gain calculator 340, and a noise / voice spectrum updater ( 350).

현재 SNR 추정부(310)는 입력 잡음 음성신호를 프레임 단위로 나누어서 주파수 영역 신호로 변환하여 현재 프레임의 신호대잡음비 및 이전 프레임의 신호대잡음비를 추정한다. 음성부재확률 계산부(320)는 현재 프레임의 신호대잡음비 및 이전 프레임으로부터 예측된 현재 프레임의 예측 신호대잡음비로부터 음성부재확률을 계산한다. SNR 수정부(330)는 음성부재확률 계산부에서 계산된 음성부재확률을 이용하여 현재 SNR 추정부(310)에서 계산된 두 신호대잡음비를 수정한다. 이득계산부(340)는 SNR 수정부(330)에서 수정된 두 신호대잡음비로부터 결정되는 현재 프레임의 이득을 계산하며, 계산된 이득은 현재 프레임의 음성신호 스펙트럼에 곱해져서 향상된 음성으로 출력된다. 잡음/음성 스펙트럼 갱신부(350)는 다음 프레임의잡음 및 음성 파워를 추정하여 예측 신호대잡음비를 구하여 현재 SNR 추정부(310)로 출력한다. 다만, 도 3에 도시된 스펙트럼 차감법을 이용하여 잡음을 제거하는 장치는 예시적인 것일 뿐이며, 본 발명에서 이용하는 스펙트럼 차감법은 어떠한 형태라도 가능하다.The current SNR estimator 310 estimates the signal-to-noise ratio of the current frame and the signal-to-noise ratio of the previous frame by dividing the input noise speech signal into frame-domain signals. The speech absence probability calculator 320 calculates the speech absence probability from the signal-to-noise ratio of the current frame and the predicted signal-to-noise ratio of the current frame predicted from the previous frame. The SNR correction unit 330 modifies the two signal-to-noise ratios calculated by the current SNR estimator 310 using the speech absence probability calculated by the speech absence probability calculator. The gain calculator 340 calculates a gain of the current frame determined from the two signal-to-noise ratios modified by the SNR correction unit 330, and the calculated gain is multiplied by the voice signal spectrum of the current frame and output as an improved voice. The noise / voice spectrum updater 350 estimates the noise and voice power of the next frame, obtains a predicted signal-to-noise ratio, and outputs it to the current SNR estimator 310. However, the apparatus for removing noise using the spectral subtraction method shown in FIG. 3 is merely exemplary, and the spectral subtraction method used in the present invention may be in any form.

본 발명에 따른 음성인식 전처리방법에 의한 효과를 입증하기 위해 한국어연속숫자음 인식기를 사용하여 음성인식 실험을 수행하였으며, 인식결과, 음성신호에 잡음이 혼합된 경우 인식률의 커다란 증가를 보였다.In order to prove the effect of the speech recognition preprocessing method according to the present invention, a speech recognition experiment was performed using the Korean continuous numeric recognizer. When the noise was mixed with the speech signal, the recognition rate was greatly increased.

연속 숫자열은 3개에서 7개까지 랜덤하게 구성되었으며, 모델 학습을 위해 화자 93명, 인식을 위해 화자 47명(1729개 문장, 8872개 단어)을 참여시켰다. 표본화율과 양자화율은 각각 8kHz, 16bit이다. 특징벡터는 12차 MFCC, 12차 Delta-MFCC, 에너지, 그리고 Delta 에너지로 구성되어 26차이다. 모델링은 연속분포 은닉마코프모델(hidden Markov model; HMM)을 사용하였다. 잡음데이터는 100km/h 차량주행소음, 백색 가우시안 잡음, babble 잡음 등이다. 이렇게 구성된 인식시스템의 잡음이 없는 환경에서의 베이스라인(Baseline)의 인식결과(단어인식률)는 92.67 %이다.The sequence consisted of 3 to 7 random numbers. The participants were 93 speakers for model training and 47 speakers (1729 sentences, 8872 words) for recognition. The sampling rate and quantization rate are 8kHz and 16bit, respectively. The feature vector is composed of 12th order MFCC, 12th order delta-MFCC, energy, and delta energy. Modeling used a continuous distribution hidden Markov model (HMM). Noise data includes driving noise of 100km / h, white Gaussian noise, and babble noise. The recognition result (word recognition rate) of the baseline in the noise-free environment of the recognition system thus constructed is 92.67%.

상기 음성인식 실험에서 독립성분분석 알고리즘은 미국특허 제5,675,659호(발명의 명칭:지연 및 필터링된 소오스의 블라인드 분리를 위한 방법 및 장치)에 개시된 시간 영역 피드백(time-domain feedback) 알고리즘을 사용하였으며, 스펙트럼차감법은 글로벌 소프트 디시젼(global soft decison)에 기반한 방식(N. Kim and J. Chang, "Spectral enhancement based on global soft decision," IEEE SignalProcessing Letters., Vol. 7, pp. 108-110, 2000)을 사용하였다.The independent component analysis algorithm in the speech recognition experiment used a time-domain feedback algorithm disclosed in U.S. Patent No. 5,675,659 (name and method for blind separation of delayed and filtered sources). Spectral subtraction is based on global soft decison (N. Kim and J. Chang, "Spectral enhancement based on global soft decision," IEEE Signal Processing Letters., Vol. 7, pp. 108-110, 2000).

도 4의 (a)는 음성에 잡음이 섞인 원래의 혼합신호에 대한 음성인식 결과를 나타낸다. 원래의 음성신호의 크기뿐만 아니라 잡음 신호도 상당히 크게 섞여 있음을 알 수 있다.4 (a) shows a voice recognition result of the original mixed signal in which noise is mixed with voice. It can be seen that the noise signal as well as the size of the original voice signal are mixed considerably.

도 4의 (b)는 혼합신호에 대해 스펙트럼 차감법을 이용한 전처리 수행후 음성인식 결과를 나타낸다. 잡음 부분의 크기가 감소되기는 하였지만, 스펙트럼 차감법은 신호대잡음비가 커질수록 잡음을 더 효과적으로 감소시킬 수 있는 방법이므로, 혼합신호에서의 신호대잡음비가 크지 않기 때문에, 스펙트럼 차감법만을 이용하여 전처리 수행한 후의 음성인식 결과에서는 여전히 잡음 신호가 상당히 섞여 있음을 알 수 있다.4 (b) shows a result of speech recognition after performing preprocessing using the spectral subtraction method on the mixed signal. Although the size of the noise portion is reduced, the spectral subtraction method can reduce the noise more effectively as the signal-to-noise ratio increases, so the signal-to-noise ratio of the mixed signal is not large. The speech recognition results show that the noise signal is still quite mixed.

도 4의 (c)는 혼합신호에 대해 독립성분분석법을 이용한 전처리 수행후 음성인식 결과를 나타낸다. 스펙트럼 차감법보다는 음성인식 결과 잡음신호의 크기가 작다는 것을 알 수 있다.4 (c) shows the result of speech recognition after performing preprocessing using the independent component analysis on the mixed signal. As a result of speech recognition rather than spectral subtraction, it can be seen that the magnitude of the noise signal is small.

도 4의 (d)는 혼합신호에 대해 먼저 독립성분분석법을 이용하여 잡음신호를 제거한 음성신호에 대해 스펙트럼 차감법을 이용하여 음성신호로부터 잔여 잡음 신호를 제거하는 전처리를 수행한 후의 음성인식결과를 나타낸다. 혼합신호를 먼저 독립성분분석법에 의해 음성신호와 잡음신호를 분리함으로써, 분리된 음성신호에 남아있는 잡음신호의 크기를 줄일 수 있으므로 신호대잡음비는 커지게 되고, 이와 같이 신호대잡음비가 큰 음성신호를 대상으로 스펙트럼 차감법을 적용함으로써 도 4의 (d)에 도시된 바와 같이 잡음의 크기는 독립성분분석법만을 이용하여 전처리할때보다 훨씬 줄어든 것을 알 수 있다.4 (d) shows the speech recognition result after performing preprocessing to remove the residual noise signal from the speech signal using the spectral subtraction method on the speech signal from which the noise signal is first removed using the independent component analysis method on the mixed signal. Indicates. By separating the mixed signal and the noise signal by independent component analysis first, the size of the noise signal remaining in the separated voice signal can be reduced, resulting in a large signal-to-noise ratio. By applying the spectral subtraction method as shown in (d) of FIG. 4, it can be seen that the magnitude of the noise is much smaller than that of the preprocessing using only the independent component analysis method.

도 5는 전처리를 수행하지 않은 원래의 혼합신호, 스펙트럼 차감법을 수행한 신호, 독립성분분석법을 수행한 신호, 독립성분분석법과 스펙트럼 차감법을 수행한 신호에 대한 음성인식 실험 결과의 수치를 나타낸다.5 shows numerical values of negative recognition results for the original mixed signal without preprocessing, the signal with spectral subtraction, the signal with independent component analysis, and the signal with independent component analysis and spectral subtraction. .

무반향 실험 결과는 에코효과가 전혀 없는 환경에서 실험한 결과를 나타내며, 반향실 모의실험은 에코효과를 억제하지 않은 환경에서 실험한 결과를 나타낸다. 스펙트럼 차감법보다는 독립성분분석법에 의해 음성인식 결과가 더 향상됨을 알 수 있으며, 전처리로서 독립성분분석법만을 이용한 음성인식 결과(무반향실:79.39, 반향실:56.51)보다는 독립성분분석법을 수행한 후에 스펙트럼 차감법을 수행한 음성인식의 결과(무반향실:82.29, 반향실:60.96)가 더욱 개선되었음을 알 수 있다.The results of anechoic experiments show the results of experiments in the environment without any echo effect, and the echo chamber simulation shows the results of the experiments in the environment where the echo effects are not suppressed. It can be seen that the speech recognition result is improved by the independent component analysis method rather than the spectral subtraction method. It can be seen that the results of the speech recognition performed by the subtraction method (the anechoic chamber: 82.29 and the echo chamber: 60.96) are further improved.

이상과 같은 본 발명에 의하면, 음성인식 전처리 단계에서 독립성분분석방법과 스펙트럼 차감법을 직렬적으로 연결하여 사용하므로, 일단 음성신호와 잡음신호를 분리하고, 분리된 음성신호에서 잔여 잡음을 제거함으로써 음성인식기의 실용화에 가장 커다란 장애물인 주변 소음을 음성인식의 전처리 단계에서 효과적으로 제거함으로써 음성인식의 결과를 향상시킬 수 있다.According to the present invention as described above, since the independent component analysis method and the spectral subtraction method are used in series in the speech recognition preprocessing step, by first separating the speech signal and the noise signal, and removing the residual noise from the separated speech signal The result of speech recognition can be improved by effectively removing ambient noise, which is the biggest obstacle to the practical use of speech recognition, in the preprocessing stage of speech recognition.

Claims

In the method for removing noise from a voice signal,

Separating the speech and noise signals from a mixed signal of two or more speech and noises using independent component analysis;

Selecting an audio signal from the separated signals;

Removing residual noise from the selected speech signal using a spectral subtraction method.

A speech recognition method comprising the noise removing method according to claim 1.

In the device for removing noise from a voice signal,

A signal separator for separating a voice signal and a noise signal from a mixed signal of two or more voices and noises using an independent component analysis method;

A signal selection unit for selecting an audio signal among the signals separated by the signal separation unit;

And a noise canceller for removing residual noise from the voice signal selected by the signal selector using a spectral subtraction method.

A speech recognition device comprising the noise canceling device according to claim 3.