CN103559888B

CN103559888B - Based on non-negative low-rank and the sound enhancement method of sparse matrix decomposition principle

Info

Publication number: CN103559888B
Application number: CN201310548773.9A
Authority: CN
Inventors: 孙成立; 须明; 王希敏; 谢坚筱
Original assignee: KEY LABORATORY OF SCIENCE AND TECHNOLOGY ON AVIONICS INTEGRATION TECHNOLOGIES
Current assignee: KEY LABORATORY OF SCIENCE AND TECHNOLOGY ON AVIONICS INTEGRATION TECHNOLOGIES
Priority date: 2013-11-07
Filing date: 2013-11-07
Publication date: 2016-10-05
Anticipated expiration: 2033-11-07
Also published as: CN103559888A

Abstract

The invention discloses a kind of based on non-negative low-rank with the sound enhancement method of sparse matrix decomposition principle.First noisy speech signal is smoothed by the method, framing and discrete Fourier transform, obtains noisy speech frequency spectrum；Then the noisy speech amplitude spectrum of every frame is sequentially arranged one noisy speech time-frequency matrix of composition as column vector, by noisy speech time-frequency matrix is carried out non-negative low-rank and sparse matrix decomposition, it is thus achieved that the low-rank of non-negative and sparse matrix；Utilize sparse matrix and noisy speech phase reconstruction to strengthen voice spectrum, obtain the enhancing voice of forms of time and space finally by inverse Fourier transform.The present invention is strong to noise adaptation, be made without end-point detection and model training, parameter the most easily regulate, and strong noise environment performance is good, has good application prospect.

Description

Based on non-negative low-rank and the sound enhancement method of sparse matrix decomposition principle

Technical field

The present invention relates to signal processing field, it is adaptable to the noise suppressed of noisy speech, be based particularly on non-negative low-rank and sparse square The sound enhancement method of battle array decomposition principle.

Background technology

Voice signal is Human communication's information means the most natural, maximally effective.Along with the mankind enter the information age, in the urgent need to Use the voice processing technology of advanced person to promote that human society is intelligent.As far back as 2000, Bill Gates was just it is proposed that " following 10 Year is the epoch of voice ".Recent years, successively release intelligent sound service, intelligence along with companies such as Fructus Mali pumilae, Google, Microsofts Voice industry has become as the new industry in areas of information technology, and user cognition degree and market scale the most gradually expand.Especially It is that the smart mobile phone that Fructus Mali pumilae is released recently has voice assistant's function, and the voice " cloud " that University of Science and Technology news fly is released, and makes intelligence Voice technology can face more wide application.But, inevitably by from ring around in voice communication and application process Border, communication media and the interference of inside communication equipment noise, had a strong impact on the actual application of intelligent sound technology.

Speech enhan-cement is the effective technology solving sound pollution.Speech enhan-cement is by the suppression noise interference to voice so that strengthen The voice signal processed is minimum with the distortion between original clean voice signal.Come in the past few decades, emerged in large numbers many speech enhan-cement Algorithm, typical algorithm includes spectrum-subtraction, based on spectral amplitude least mean-square error, subspace method, wavelet de-noising method.In noise In the environment of the highest, speech enhan-cement has been obtained for effectively solving.But, due to the multiformity of noise and voice in natural environment The complexity of signal itself, voice enhancement algorithm is different according to the difference of applied environment, and this makes its research work difficulty very big, The Speech Enhancement problem of very noisy and multiple noise circumstance is solved the most very well.

In existing voice enhancement algorithm, many methods attempt to use the pdf model of voice signal and noise signal to come Remove noise signal in big degree, but research in recent years shows that certain single distribution can not be applicable to all of voice or make an uproar Sound, needs mathematical model and model algorithm for estimating the most flexibly are to adapt to the feature of signal self.Additionally, increase at existing voice In strong algorithms, Noise Estimation is the early stage indispensability work of voice enhancement algorithm.By Noise Estimation can obtain noise power spectrum and The prior weight of voice signal, the improvement to speech enhan-cement effect is most important.Existing sound enhancement method passes through end-speech The voice signal collected is divided into noise segment and noisy speech section by some detection, utilizes noise segment to estimate and updates Noise Estimation amount, But this is a kind of suboptimal estimation mode, the instantaneous noise of noise segment and noisy speech section being not exclusively consistent in reality, therefore, This noise estimation method always brings error, moreover existing voice end-point detection technology is at low signal-to-noise ratio and nonstationary noise environment Under the most immature, easily cause erroneous judgement, can cause voice exists the biggest residual noise.

In recent years compressive sensing theory research shows, many actual observed quantities can be attributed to a low-rank component and sparse point The pattern that amount is added, by low-rank and the sparse matrix decomposition of matrix, can recover former from big noise or exceptional value contamination data Beginning data message.The low-rank of matrix and sparse matrix decomposition have been used to image enhaucament, video object detection, data mining etc. and are permitted Many sciemtifec and technical spheres.

Stationary random noise and periodic noise are modal two kinds of noise class.Stationary random noise's single order and second-order statistic Describing its stochastic process, its average and auto-correlation function are unrelated with the time, owing to the Fourier of stochastic signal auto-correlation function becomes Changing is power spectrum, therefore the time-frequency matrix of stationary random noise be an order number be the low-rank matrix of 1.Equally, if noise is week Phase property noise, owing to its time-frequency matrix only has value at some fixed frequency, its rectangular array vector has stronger dependency, must Also it is so a low-rank matrix.

In sum, the time-frequency rectangular array vector of background noise has the strongest dependency, and therefore the time-frequency matrix of noise has low Order.Relatively for background noise, speech source signals value in major part time frequency point is zero or close to zero, only minority Sample point value is relatively big, so speech source signals has certain openness, applicable sparse matrix describes.Therefore, it can Consider to use for reference the low-rank of matrix and sparse resolution theory solves Speech Enhancement problem.Chinese patent disclose a kind of based on low-rank with The single channel of sparse matrix decomposition is made an uproar separation method (publication number: CN102915742A) without supervision language.The method first by Noisy speech time domain waveform is transformed to time-frequency domain thus obtains the amplitude spectrum of noisy speech by Short Time Fourier Transform；Utilize low-rank with The amplitude spectrum of noisy speech is decomposed into noise amplitude spectrum, voice amplitudes spectrum and residual noise amplitude spectrum three by sparse matrix decomposition algorithm Person's sum；Finally, the voice time domain waveform that inverse Fourier transform reconstructs from the amplitude spectrum of voice in short-term is utilized.The method Deficiency is low-rank and sparse matrix decomposition not to be added nonnegativity restriction, is easily caused isolated from noisy speech amplitude spectrum Voice amplitudes spectrum containing negative result.And the amplitude spectrum of reality is non-negative physical quantity, negative value phenomenon should not occur.Negative value width Degree spectrum not only causes resolution error, and can produce the music noise that human ear is felt to feel bad, thus affects phonetic hearing quality.

The present invention devises a kind of based on non-negative low-rank with the sound enhancement method of sparse matrix decomposition principle, and the method uses non-negative Low-rank and sparse matrix decomposition principle decompose noisy speech amplitude spectrum, and the voice amplitudes spectrum that decomposition can be made to obtain meets nonnegativity, has Effect improves low-rank and sparse matrix decomposition effect.The method has strong robustness, be made without end-point detection and parameter is few The advantages such as easy regulation, are suitable for the speech enhan-cement task under strong noise environment.

Summary of the invention

The technical problem to be solved is to provide a kind of based on non-negative low-rank and the speech enhan-cement side of sparse matrix decomposition principle Method, by introducing noise and the low-rank of voice and sparse constraint and nonnegativity restriction carries out low-rank and sparse in time-frequency domain Matrix decomposition, it is achieved the language of noisy speech is made an uproar separation.

The present invention takes techniques below scheme, based on non-negative low-rank and the sound enhancement method of sparse matrix decomposition principle, uses non-negative Low-rank and sparse matrix decomposition method isolate voice signal from noisy speech, and implementation step is as follows:

(1) discrete noisy speech signal being carried out pretreatment, pretreatment includes signal smoothing and framing；

(2) noisy speech signal after framing is carried out discrete Fourier transform, obtain noisy speech frequency spectrum；

(3) in a frequency domain, using the spectrum amplitude of every frame voice as column vector, it is sequentially arranged, by several speech frame structures Become noisy speech time-frequency matrix；

(4) utilize non-negative low-rank and sparse matrix decomposition algorithm that noisy speech time-frequency matrix is decomposed, it is thus achieved that the low-rank square of non-negative Battle array and sparse matrix；Decomposing expression formula is:

Y=L+S+E meets rank (L)≤r, | | S | |₀≤h,L≥0,S≥0；

Wherein: Y is noisy speech time-frequency matrix；L is low-rank matrix, the amplitude spectrum of corresponding noise；S is sparse matrix, corresponding language The amplitude spectrum of sound, | | S | |₀Representing the non-zero element number that sparse matrix S contains, the order of rank (L) representing matrix L, E is residual matrix, R and h represents low-rank and sparse constraint upper limit parameter；

(5) the phase spectrum reconstruct utilizing sparse matrix S and noisy speech strengthens voice spectrum, then by inverse Fourier transform, Enhancing voice to forms of time and space.

The processing procedure that discrete noisy speech signal carries out in described step (1) pretreatment is:

(1) P point arest neighbors signal average is used to carry out signal smoothing, in order to the amplitude wave-shape of smooth noisy speech；

(2) to noisy speech signal framing, the window function that framing uses is Hamming window, and a length of 200 points of window, each interframe moves Overlap to count be 80 points.

The step calculating low-rank matrix L and sparse matrix S is as follows:

(1) initialize: Y₀=Y；L₀=S₀=[0]_N×K；

Iterations initial value i=1；Maximum iteration time imax=10³；Relative error threshold value δ=10^-3；

(2) use NMF renewal low-rank matrix: (W, H)=NMF (Y_i-1), L_i=WH；W∈R^N×r, H ∈ R^r×K；

NMF represent Non-negative Matrix Factorization, NMF represent Non-negative Matrix Factorization, W and H be order be the NMF decomposition result of r, the survey of NMF Degree function selects Itakura-Saito to estimate；

(3) Soft-thresholding operator is used to update sparse matrix: S_i=(Y_i-1-L_i+S_i-1＞ λ) (Y_i-1-L_i+S_i-1-λ)；

Wherein: symbolRepresenting matrix correspondence position element product, λ is thresholding constant；λ is relevant with noise level, it is recommended that value λ=σ, wherein σ is the mean square deviation of noise；

(4) superposition matrix: Y is updated_i=L_i+S_i；

(5) if i reach maximum iteration time i=imax orStop iteration, the estimated value of output L and S L_iAnd S_i；Otherwise jump to step (2), i=i+1；Continue executing with iterative process.

The phase spectrum utilizing sparse matrix and noisy speech in described step (5) reconstructs and strengthens voice spectrum:

Wherein: ∠ Y (n, k) is the spectral phase of noisy speech, and S is sparse matrix, S (n, k) is sparse matrix spectrum amplitude angle value,For the enhancing voice spectrum of reconstruct, n is time frame index, and k is frequency indices.

The sound enhancement method that the present invention provides, by non-negative low-rank and sparse matrix decomposition, can make the low-rank matrix that decomposition obtains It it is all nonnegative value with the element in sparse matrix.The method is made without end-point detection and model training, have strong robustness, The advantages such as parameter the most easily regulation, the speech enhan-cement task being particularly suitable under strong noise environment.

Accompanying drawing explanation

Fig. 1 is the speech-enhancement system block diagram of the present invention.

Detailed description of the invention

In conjunction with accompanying drawing, the invention will be further described, sees Fig. 1, based on non-negative low-rank and the voice of sparse matrix decomposition principle Enhancement Method, including step in detail below:

1) noisy speech signal y (t) is carried out pretreatment 101；Pretreatment 101 stage includes signal smoothing and framing so that it is be easy to Subsequent processes.Signal smoothing refer to use y (t) P point arest neighbors signal average to calculate noisy speech signal currency, in order to smooth The amplitude wave-shape of noisy speech signal.In the present invention, the value of P is 3, i.e.Framing uses Window function be Hamming window, a length of 200 points of window, it is 80 points that the overlap that each interframe moves is counted；

2) noisy speech signal after framing is carried out DFT (discrete Fourier transform) 102, obtain signal spectrum, signal spectrum bag Include the amplitude spectrum 104 of signal | Y (n, k) | and phase spectrum 103 ∠ Y (n, k).Wherein n represents frame index, n=1,2 ..., N；K represents frequency Rate indexes, k=1, and 2 ..., k；N is total time frame number；K is that Fourier transformation is counted；

3) in a frequency domain, being arranged as column vector order by the amplitude spectrum 104 of every frame voice, so several speech frames just constitute one The noisy time-frequency matrix Y of individual N × K.

4) noisy time-frequency matrix Y is carried out NLSMD (non-negative low-rank and sparse matrix decomposition) 105, calculate non-negative low-rank matrix L and non- Negative sparse matrix S.

Y=L+S+E meets rank (L)≤r, | | S | |₀≤h,L≥0,S≥0

Here the amplitude spectrum of L correspondence noise；The amplitude spectrum of S correspondence voice；||S||₀The non-zero element number that representing matrix S contains；E For residual matrix；The order of rank (L) representing matrix L；R and h represents low-rank and sparse constraint upper limit parameter；Through contrast test, r Preferable noise reduction is obtained during value 1～3.

The calculating process of NLSMD (non-negative low-rank and sparse matrix decomposition) 105 is as follows:

1. initialize: Y₀=Y；L₀=S₀=[0]_N×K；Iterations i=1；Maximum iteration time imax=10³；Relative error threshold Value δ=10^-3；

2. use Non-negative Matrix Factorization renewal low-rank matrix: (W, H)=NMF (Y_i-1), L_i=WH；W∈R^N×r, H ∈ R^r×K；

Wherein: L_iThe estimated value through ith iteration L, NMF represent Non-negative Matrix Factorization, W and H be order be r NMF decompose As a result, owing to W and H is nonnegative value, so L_iInevitable is also nonnegative matrix.The measure function of NMF algorithm can select Euclidean Distance, Kullback-Leibler estimate and estimate with Itakura-Saito.Through contrast test, use Itakura-Saito to estimate and obtain Good effect.Therefore, the present invention uses the NMF method estimated based on Itakura-Saito to calculate L.

3. Soft-thresholding operator is used to update sparse matrix S_i: S_i=(Y_i-1-L_i+S_i-1＞ λ) (Y_i-1-L_i+ Si-1-λ)；

Wherein: symbolRepresenting matrix correspondence position element product, λ is threshold value, and the value of λ is relevant with noise intensity, pushes away Recommending value λ=σ, wherein σ is noise mean square deviation.

4. superposition matrix: Y is updated_i=L_i+S_i；

If 5. i reach maximum iteration time i=imax orStop iteration, estimated value Li of output L and S And S_i；Otherwise jump to step 2., i=i+1, continue executing with iterative process；

5) sparse matrix S and the reconstruct of noisy speech phase spectrum is utilized to strengthen voice spectrum, owing to human ear is unwise to the phase information of sound Sense, can with the phase place ∠ Y of noisy speech frequency spectrum (n, k) replaces the phase place strengthening voice, obtains strengthening the complex number spectrum of voice:

6) the complex number spectrum matrix strengthening voice is expanded into vector, it is carried out IDFT (inverse discrete Fourier transform) 106, obtain The discrete time strengthening voice represents:

Wherein:Rectangular array vector temporally frame sequential is concatenated into the operation of one-dimensional vector by vec function representation.

Claims

1. based on non-negative low-rank and the sound enhancement method of sparse matrix decomposition principle, by non-negative low-rank and sparse matrix decomposition method Isolating voice signal from noisy speech, discrete noisy speech signal is carried out pretreatment, pretreatment includes signal smoothing and divides Frame；Noisy speech signal after framing is carried out discrete Fourier transform, obtains noisy speech frequency spectrum；In a frequency domain, by every frame The spectrum amplitude of voice, as column vector, is sequentially arranged, and is made up of noisy speech time-frequency matrix several speech frames；Profit With non-negative low-rank and sparse matrix decomposition algorithm, noisy speech time-frequency matrix is decomposed, it is thus achieved that the low-rank matrix of non-negative and sparse Matrix；Decomposing expression formula is:

Y=L+S+E meets rank (L)≤r, | | S | |₀≤h,L≥0,S≥0；

Wherein: Y is noisy speech time-frequency matrix；L is low-rank matrix, the amplitude spectrum of corresponding noise；S is sparse matrix, corresponding language The amplitude spectrum of sound, | | S | |₀Representing the non-zero element number that sparse matrix S contains, the order of rank (L) representing matrix L, E is residual matrix, R and h represents low-rank and sparse constraint upper limit parameter；The phase spectrum reconstruct utilizing sparse matrix S and noisy speech strengthens voice spectrum, Then by inverse Fourier transform, the enhancing voice of forms of time and space is obtained；It is characterized in that, calculate low-rank matrix L and sparse matrix The step of S is as follows:

(1) initialize: Y₀=Y；L₀=S₀=[0]_N×K；

NMF represent Non-negative Matrix Factorization, W and H be order be the NMF decomposition result of r, the measure function of NMF selects Itakura-Saito Estimate；

(3) Soft-thresholding operator is used to update sparse matrix: S_i=(Y_i-1-L_i+S_i-1＞ λ) (Yi_-1-L_i+S_i-1-λ)；

Wherein: symbol representing matrix correspondence position element product, λ is thresholding constant；λ is relevant with noise level, it is recommended that value λ=σ, wherein σ is the mean square deviation of noise；

(4) superposition matrix: Y is updated_i=L_i+S_i；