CN109215677B - Wind noise detection and suppression method and device suitable for voice and audio - Google Patents
Wind noise detection and suppression method and device suitable for voice and audio Download PDFInfo
- Publication number
- CN109215677B CN109215677B CN201810935974.7A CN201810935974A CN109215677B CN 109215677 B CN109215677 B CN 109215677B CN 201810935974 A CN201810935974 A CN 201810935974A CN 109215677 B CN109215677 B CN 109215677B
- Authority
- CN
- China
- Prior art keywords
- sound source
- wind noise
- ith
- source positioning
- microphone
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 72
- 230000001629 suppression Effects 0.000 title claims abstract description 35
- 238000001514 detection method Methods 0.000 title claims abstract description 32
- 230000004807 localization Effects 0.000 claims description 30
- 230000005236 sound signal Effects 0.000 claims description 30
- 238000009499 grossing Methods 0.000 claims description 28
- 238000001228 spectrum Methods 0.000 claims description 15
- 238000004364 calculation method Methods 0.000 claims description 8
- 230000007774 longterm Effects 0.000 claims description 5
- 238000004590 computer program Methods 0.000 claims description 2
- 239000000758 substrate Substances 0.000 claims 1
- 238000002474 experimental method Methods 0.000 abstract description 2
- 230000006870 function Effects 0.000 description 62
- 238000010586 diagram Methods 0.000 description 12
- 238000001914 filtration Methods 0.000 description 8
- 238000012545 processing Methods 0.000 description 5
- 230000003044 adaptive effect Effects 0.000 description 4
- 230000007613 environmental effect Effects 0.000 description 4
- 238000012805 post-processing Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 230000003595 spectral effect Effects 0.000 description 3
- 238000006467 substitution reaction Methods 0.000 description 3
- 238000004422 calculation algorithm Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 230000001788 irregular Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000003491 array Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000001427 coherent effect Effects 0.000 description 1
- 238000009792 diffusion process Methods 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000011410 subtraction method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0264—Noise filtering characterised by the type of parameter measurement, e.g. correlation techniques, zero crossing techniques or predictive techniques
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Quality & Reliability (AREA)
- Circuit For Audible Band Transducer (AREA)
Abstract
The invention provides a wind noise detection and suppression method. In one embodiment, the method comprises: obtaining the sound source positioning angle of each two of a plurality of microphones in each speech frame by carrying out complex phase interference function estimation on the speech or the audio, and further obtaining the sound source positioning angle variance of the speech frame; and determining whether wind noise exists in the voice frame according to the sound source positioning angle variance. Experiments prove that the method can detect and restrain strong non-steady wind noise in real time, and meanwhile, has no obvious voice and audio distortion.
Description
Technical Field
The present application relates to the field of noise processing, and in particular, to a method and an apparatus for detecting and suppressing a wind noise in real time for voice and audio.
Background
Currently, the conventional single-channel speech enhancement method assumes the stationary characteristic of noise, so that the stationary noise power spectrum can be estimated by a noise estimation method, and finally the stationary noise is suppressed [1 ]. The multi-channel speech enhancement method can utilize the spatial separation degree of target speech and interference noise to carry out spatial filtering by means of beam forming and the like, and realize the noise suppression of a stable state and a non-stable state [2 ]. The current deep learning-based method has a large amount of calculation, and the environmental adaptability and the generalization capability are yet to be further improved.
Wind noise belongs to strong non-steady state noise, and meanwhile, wind noise does not belong to a point sound source and has no obvious directivity, so that effective wind noise detection and suppression cannot be realized by adopting a traditional single-channel voice enhancement method or a multi-channel voice enhancement method such as spatial filtering. In recent years, some scholars propose to realize wind noise detection and suppression by a deep learning method or a non-negative matrix factorization method [3], and the methods have large calculation amount and generally have difficult guarantee of real-time performance, and particularly have no universal applicability to the application requirement of real-time low-power-consumption communication.
Disclosure of Invention
In a first aspect, an embodiment of the present invention provides a wind noise detection method. The method comprises the following steps: receiving speech and/or audio signals from M microphones, where M is an integer greater than 1; performing complex phase interference function estimation on voice and/or audio signals to obtain complex phase interference function values of K-th frequency band short-time spectrums of ith and jth frames of microphones, wherein l and K are natural numbers, i is 1,2,. M, j is 1,2,. M, K is 1,2, …, K; obtaining the sound source positioning angles of the ith microphone and the jth microphone in the ith frame according to the complex coherence function value; determining the sound source positioning angle variance in the l frame according to the sound source positioning angles of the M microphones in the l frame; and determining whether wind noise exists in the ith frame according to the sound source positioning angle variance.
In a second aspect, a wind noise suppression method is provided. The method comprises the following steps: receiving speech and/or audio signals from M microphones, where M is an integer greater than 1; performing complex phase interference function estimation on voice and/or audio signals to obtain complex phase interference function values of kth frequency band short-time spectrums of the ith microphone and the jth microphone, wherein l and k are natural numbers, i is 1,2,. M, j is 1,2,. M; obtaining the sound source positioning angles of the ith microphone and the jth microphone in the l frame according to the complex coherence function value; determining the variance of the sound source positioning angle in the l frame according to the sound source positioning angle; determining a wind noise gain function in the l frame according to the sound source positioning angle variance; and suppressing wind noise existing in the voice and/or audio signals according to the wind noise gain function of the ith frame.
In a third aspect, a wind noise detection apparatus is provided. The wind noise detection device includes: a receiving module configured to receive speech and/or audio signals from M microphones, wherein M is an integer greater than 1; an estimation module, configured to perform complex coherence function estimation on a speech and/or audio signal to obtain complex coherence function values of kth frequency band short-time spectrums of an ith frame of an ith microphone and a jth microphone, where l and k are natural numbers, i is 1,2,. M, j is 1,2,. M; the angle calculation module is configured to obtain the sound source positioning angles of the ith microphone and the jth microphone in the l frame according to the complex coherence function value; an angle variance determining module configured to determine a sound source localization angle variance at the l-th frame according to the sound source localization angle; and the wind noise determining module is configured for determining whether wind noise exists in the ith frame according to the sound source positioning angle variance.
In a fourth aspect, a wind noise suppression device is provided. The wind noise suppression device includes: a receiving module receiving speech and/or audio signals from M microphones, wherein M is an integer greater than 1; an estimation module, configured to perform complex coherence function estimation on speech and/or audio to obtain complex coherence function values of a kth frequency band short-time spectrum of an ith frame of an ith microphone and a jth microphone, where l and k are natural numbers, i is 1,2,. M, j is 1,2,. M; the angle calculation module is configured to obtain the sound source positioning angles of the ith microphone and the jth microphone in the l frame according to the complex coherence function value; an angle variance determining module configured to determine a sound source localization angle variance at the l-th frame according to the sound source localization angle; the wind noise gain determining module is configured to determine a wind noise gain function of the l frame according to the sound source positioning angle variance; and the suppression module is configured to suppress wind noise existing in the voice and/or the audio signal according to the wind noise gain function of the ith frame.
In a fifth aspect, a computer-readable storage medium is provided. The medium comprises instructions which, when run on a computer, cause the computer to perform the method according to the first or second aspect.
In a sixth aspect, a computer program product containing instructions is provided. When run on a computer, cause the computer to perform a method according to the first or second aspect.
According to the embodiment of the invention, the effective suppression of wind noise is realized by carrying out the steps of wind noise detection and wind noise suppression on the voice and/or audio signals, and meanwhile, the distortion of voice and audio is avoided to the maximum extent, and the voice quality and the audio quality are ensured. Experiments prove that the method can detect and restrain strong non-steady wind noise in real time, and meanwhile, has no obvious voice and audio distortion.
Drawings
FIG. 1 is a schematic diagram of a signal model;
FIG. 2 is a schematic block diagram of a multi-microphone wind noise suppression according to an embodiment of the present invention;
FIG. 3 is a diagram of wind noise and the digital-to-square comparison of the directional target speech complex coherence function;
FIG. 4 is a schematic illustration of wind noise source localization and directional target voice sound source localization angle over time;
FIG. 5 is a schematic block diagram of a full-band wind noise detection based on complex phase interference function digital-to-analog squared sum phase;
FIG. 6 is a schematic diagram of a sub-molecular wind noise detection based on complex phase interference function digital-to-analog square sum phase;
FIG. 7 is a schematic diagram of a gain function protection strategy based on speech harmonic characteristics and high and low frequency energy ratios;
FIG. 8 is a graph illustrating the processing of wind noise detection and suppression using an embodiment of the present invention; wherein (a) is a time domain map of speech contaminated by wind noise; (b) is a speech spectrogram polluted by wind noise; (c) is a speech time domain graph after wind noise suppression; (d) is a speech spectrogram with wind noise suppression.
Detailed Description
The embodiment of the invention provides a low-algorithm complexity real-time wind noise detection and suppression method, which is suitable for voice and audio signals, can be applied to a real-time voice and audio communication system, and can also be applied to non-real-time voice and audio signal enhancement. The embodiment of the invention obtains the sound source positioning angle of each two of a plurality of microphones in each speech frame by carrying out complex phase interference function estimation on the speech or the audio, and further obtains the sound source positioning angle variance of the speech frame; and determining whether wind noise exists in the voice frame according to the sound source positioning angle variance.
In another embodiment of the present invention, a wind noise gain function at a speech or audio frame may be determined based on a sound source localization angle variance at the speech or audio frame; wind noise present in the speech and/or audio signal is then suppressed according to a wind noise gain function for the speech or audio frame.
When complex phase interference function estimation is carried out, if a smoothing factor is large, the characteristic of instantaneous change is easy to be blurred, for example, distortion can be caused in the process of increasing energy coefficients such as a voice initial segment; whereas if the smoothing factor is too small, it is easy to cause the valid speech to be suppressed by wind noise in a portion where the energy is weak, such as an end segment of speech. To this end, in one embodiment, a dual or multiple smoothing factor, or an adaptive smoothing factor, may be employed for complex coherence function estimation. The complex coherence function estimate may be used to determine a complex coherence function digital-to-analog squared average of the speech or audio frame, thereby assisting wind noise detection and suppression.
In one embodiment, a sub-band wind noise detection strategy may be employed to combine at least one of the K frequency bands into a sub-band, and then determine the sound source localization angles and sound source localization angle variances of the M microphones two by two with respect to each other at multiple sub-bands of the speech or audio frame. Thereby, the wind noise is detected and suppressed.
In one embodiment, the harmonic-to-noise ratio at L frames is used for voiced sound protection in view of the harmonic characteristics of voiced sounds.
In one embodiment, in consideration of the characteristics of strong wind noise low-frequency energy and strong unvoiced high-frequency energy, the unvoiced sound can be protected by adopting high-low frequency energy.
The present invention will be described below with reference to specific examples.
Fig. 1 is a schematic diagram of a signal model. In fig. 1, a variety of sound signals are illustrated, including a target speech signal, a directional interferer, stationary noise, and wind noise. By way of example, the speech environment includes speakers A and B, both of which are speaking. Assuming that the voice of the speaker a is a target of the voice processing, the voice signal of the speaker B constitutes a directional interference noise as a target voice signal. In the same speech environment there may also be noise, such as from a car, and possibly from wind induced noise. Automotive noise represents a type of steady state noise. Wind noise has characteristics that are different from directional interference noise and stationary noise, as will be discussed below.
In order to collect a target speech signal, M microphones are provided in the speech environment. Suppose that the signal x received by the ith microphonei(n) is:
xi(n)=si(n)+dt,i(n)+ds,i(n)+dw,i(n) (1)
wherein s isi(n)、dt,i(n)、ds,i(n) and dw,i(n) target speech or audio respectively received by the i-th microphone (for convenience of description)Hereinafter simply referred to as speech) signals, directional interferers, stationary noise, and wind noise; 1,2, M, where M is the number of microphones.
In the case of multiple microphones with M > 1, for directional interference noise dt,i(n) effective suppression can be achieved by beam forming means; for environmental noise, the suppression can be realized by a post-filtering method, and then the wind noise suppression method provided by the embodiment of the invention is combined to effectively suppress the wind noise.
Fig. 2 is a schematic block diagram of the wind noise suppression principle of a multi-microphone. As shown in fig. 2, the wind noise suppression includes beamforming 21, post-processing 22, wind noise detection (23-24), wind noise gain estimation 25, and so on. In view of the protection of voiced and unvoiced sounds, harmonic noise ratio and high-low energy ratio estimation 26 may also be performed, and further wind noise gain function estimation 27 based on speech characteristics may be performed.
In the case of M > 1 multi-microphones, the speech signals x picked up by M microphonesi(n) (i ═ 1,2 … M), the directional interference noise d can be achieved by using beam forming meanst,i(n) effective suppression. The beamforming may employ a fixed beamforming method or an adaptive beamforming method. The fixed Beamforming method includes Delay-and-Sum Beamforming (DSB), Delay-and-Filtering (DFB), and Robust super-directional Beamforming (RSB). The adaptive beamforming algorithm may include a Generalized Sidelobe suppression method (GSC), a Minimum Variance Distortionless Response Method (MVDR), a Multi-channel Wiener Filtering Method (MWF), and the like.
In the post-processing step, the signal after the beam forming can be filtered by adopting a filtering method to eliminate the environmental noise. The multi-channel post-filtering method can adopt a coherence-based method, an energy-based method or a combination of the two methods, and the like, and can also adopt a spectral subtraction method, a subspace method and the like for the purpose of steady-state noise suppression. In one example, the residual directional noise and stationary noise are preprocessed using spectral subtraction in consideration of stability and computation amount.
The output signal of the signals received by the multiple microphones after beam forming and post filtering is x (n), and the stable state noise residue is not considered:
x(n)=s(n)+dw(n) (2)
where s (n) is the estimated target speech component, dw(n) is the residual wind noise. (2) The frequency domain of formula is represented as:
X(k,l)=S(k,l)+Dw(k,l) (3)
wherein X (k, l), S (k, l) and Dw(k, l) are x (n), s (n) and d, respectivelywThe kth frequency band short-time spectrum of the (n) th frame can be realized by Fast Fourier Transform (FFT). Similarly, the signal x is received for the ith microphonei(n) performing FFT with the corresponding k-th band short-time spectrum of the l-th frame as Xi(k,l)。
After beamforming and post-processing, the signal is wind noise detected. Fig. 5 is a schematic block diagram of full-band wind noise detection based on the digital-to-analog square sum phase of complex interference function.
First, a double smoothing factor complex coherence function estimation is performed. The strong non-stationary characteristic and the strong energy characteristic of wind noise are considered, a method for estimating the complex coherent function by using double smoothing factors is provided, and the influence of the wind noise on weak voice and audio is avoided.
The traditional complex coherence function estimation method adopts a fixed smoothing factor, namely:
wherein, Cij(k, l) is complex coherence function value of kth frequency band short-time spectrum of ith frame of ith microphone and jth microphone, which is small quantity greater than 0, avoiding zero-division operation; l, k are natural numbers, i is 1,2,. M, j is 1,2,. M.Self-power spectrum and cross-power spectrum: when i ═ j, Rij(k,l)=Rii(k,l)=Rjj(k, l) is self powerWhen i ≠ j, it is cross-power spectrum α is a fixed smoothing factor, generally ranging from 0 to 1. to reduce the estimation bias, α should be close to 1, but this will cause distortion of the speech and audio initial segments, especially when the wind noise energy is high, it will cause distortion of the speech and audio signals for a long time.
In one embodiment, a fixed smoothing factor problem is solved using a double smoothing factor strategy, where a large smoothing factor α1∈[0.7 0.9]For estimating long-term complex coherence function estimation, small smoothing factor α2∈[0.4 0.6]For estimating the instantaneous complex coherence function. The long-term complex phase dry function and the instantaneous complex phase dry function are respectivelyAndall adopt the formula (4) calculation except thatWith a large smoothing factor α1And a small smoothing factor α2The substitution is made. Of course, it is also possible to use three or more smoothing factors to estimate the complex coherence function multiple times for the purpose of protecting the speech and audio signals, which is a simple extension and expansion of the dual-smoothing-factor strategy and should also fall within the scope of the present patent.
Further, in one embodiment, an adaptive leveling slip factor α may be employed. For example, the smoothing factor α is adaptively adjusted according to the stationarity of the signal picked up by the microphone, the absolute value of the complex coherence function, or the signal wind-to-noise ratio, so as to achieve the purpose of protecting the voice and audio signals, which is also protected by the present patent.
And then, carrying out full-band wind noise detection based on the digital-to-analog square sum phase of the complex phase interference function.
The wind noise characteristics are different from the directional target voice and audio sound source, and are embodied as follows: first, the complex coherence function digital-to-analog square of the wind noise between different microphones is close to 0, while the complex coherence function digital-to-analog square of the directional target speech audio is close to 1. Fig. 3 presents a graph of the complex coherence function digital-to-analog squared of wind noise and a directional target speech sound source, which verifies the difference in wind noise characteristics from the directional target speech and audio sound source. Secondly, wind noise does not belong to a directional sound source, and adjacent frames of sound source positioning angles are irregular; the directional sound source can find the direction of the point sound source by a sound source positioning method, and the inter-frame sound source positioning angle has continuity; and the environmental noise close to the diffusion field has the zero phase characteristic and also shows the good inter-frame continuity characteristic of the sound source positioning angle. FIG. 4 shows the results of the wind noise source localization and the directional target voice sound source localization angle over time, with the first 5 seconds target voice sound source localization at 0 degrees and the source localization angle jitter only in the silent segment; and in the data of the 6 th to 10 th seconds in which wind noise exists, the sound source localization angle appears to be greatly jumped. Wind noise can be detected according to the continuity of the sound source azimuth, and sound source angle variance calculation can be carried out by combining multiple frames; when the variance exceeds a certain threshold, the current frame can be preliminarily judged as wind noise.
And further combining the digital-to-analog square characteristic of the complex phase interference function, detecting wind noise by a double-threshold judgment mode, and the schematic block diagram is shown in figure 5. The method comprises the following specific steps:
1) performing pairwise microphone sound source positioning based on a complex coherence function by adopting a GCC-PHAT (Generalized Cross-correlation Phase Transform) method or a SCOT-PHAT (Standard Cross-correlation Phase Transform) method and the like, and assuming that the sound source positioning angle of the ith microphone and the jth microphone in the first frame is thetaij(l) (ii) a All or part of the frequency band of the complex coherence function may be used for sound source localization.
2) Calculating the sound source localization angle variance at the l-th frame, i.e., M (M-1)/2 groups θij(l) Variance:
3) calculating the mean value of the square of each frame complex phase interference function digital model:
wherein k islowAnd kupLower and upper limits of the frequency band, k, respectivelylowMinimum value of 0, kupThe maximum value is 1/2 of the FFT frame length;
4) performing wind noise detection according to the complex phase interference function digital-analog square mean value and the sound source positioning angle variance, wherein the detection basis is as follows:
The complex interference function digital-to-analog square mean value can assist wind noise detection based on the sound source positioning angle variance, and is beneficial to reducing voice distortion possibly brought by the complex interference function digital-to-analog square mean value.
Fig. 6 presents an alternative to fig. 5. The alternative is based on the detection of the complex phase interference function digital-to-analog squared sum phase with the wind noise in the molecule.
When the number of microphones is only two, that is, M is 2, the variance cannot be calculated by using equation (5), and equation (7) can be determined only by the complex phase interference function digital-analog square mean value. In order to solve the problem and reduce the voice distortion of a frequency band which is not influenced by wind noise, a molecular wind noise detection strategy is further provided by utilizing the characteristic that the positioning angle of a wind noise source between adjacent sub-bands is irregular, and the specific steps are as follows:
1) combining multiple frequency bands into a sub-band, e.g. every 500Hz, and performing sound source localization on each sub-band, wherein the sound source localization angle of each sub-band is thetaij(kappa, l), wherein kappa denotes a subunitA pointer is arranged;
2) calculating the M (M-1)/2 group thetaij(kappa, l) variance
when M is 2, there is only one set of θ12(κ, l), where the variance is calculated as follows:
wherein,κupis the subband upper bound. When M is 2, the wind noise detection sound source angle variance of equation (7) can be calculated by equations (8) and (9);
3) calculating the mean value C of the sub-band complex phase interference function digital-to-analog squarea(kappa, l) and Cb(κ, l) calculated similarly to equation (6) with only k being consideredlowAnd kupI.e. k for each subbandlowAnd kupCarrying out corresponding modification;
4) performing molecular band wind noise detection according to the sub-band complex phase dry function digital-to-analog square mean and the sub-band sound source angle variance, wherein the detection basis is as follows:
wherein,andfor the threshold, q (k, l) ═ 1 indicates that the kth sub-band of the l-th frame is wind noise.
The sub-band combined wind noise detection scheme realizes strong suppression of wind noise and avoids distortion of wind noise-free frequency band voice and audio.
Under the condition that wind noise is detected, the harmonic noise ratio and the high-low frequency energy ratio can be combined to protect the voice, and therefore wind noise suppression is achieved.
Fig. 7 is a schematic diagram of a gain function protection strategy based on speech harmonic characteristics and high-low frequency energy ratio. In the upper part of fig. 7, the process of protecting voiced sounds with harmonic noise ratio is illustrated.
Voiced sounds have harmonic characteristics, and in order to extract the fundamental frequency of harmonics, a cepstrum-based method and a frequency domain-based method may be employed. The method comprises the following specific steps.
1) Firstly, calculating a power cepstrum coefficient c (lambda, l), wherein lambda is a cepstrum coefficient pointer;
2) judging whether the current frame is voiced according to c (lambda, l) according to the following judgment:
wherein, cthIs a threshold, λminAnd λmaxMinimum and maximum cepstrum frequency coefficient pointers corresponding to voiced fundamental frequencies;
3) b is a lambda of c (lambda, l)minTo lambdamaxThe middle maximum coefficient is set to 0 and the harmonic noise ratio is estimated to be
HNR (k, l), the value of the first speech protection gain function in the kth band of the l frame is:
considering the long-term spectral characteristics, wind noise is generally stronger in low-frequency energy, and unvoiced high-frequency energy is generally stronger. In the lower part of fig. 7, the process of protecting unvoiced sound according to the energy ratio of high and low frequencies is illustrated. The method comprises the following specific steps.
1) Calculating high frequency energyAnd low frequency energyWherein Kmin,KmidAnd KhighThe minimum band, the middle band and the highest band for calculating the low and high frequency energy, respectively;
2) calculating a high-low frequency energy ratio PR (l) ═ P of each framehigh(l)/Plow(l);
3) For unvoiced speech protection according to pr (l), the value of the second speech protection gain function in the kth band of the l frame is:
wherein, PRthIs a threshold. Gp,2(k, l) takes a value of 1 only in the mid-high frequency band where unvoiced sounds may occur.
When full-band wind noise detection is employed, the wind noise gain function is G1(k, l) ═ 1-q (l), i.e., 1-q (l) values are assigned to the gain function for all bands in the l-th frame. When the molecular band wind noise detection is adopted, the wind noise gain function is G1(k, l) ═ 1-q (k, l), i.e., 1-q (k, l) values are assigned to the gain function of the kth subband in the l-th frame.
According to the two voice protection strategies, a final wind noise suppression gain function is formed as follows:
Gw(k,l)=max{G1(k,l),Gp,1(k,l),Gp,2(k,l)} (14)
the finally output enhanced voice time domain signal is obtained through inverse FFT and overlap addition, namely:
fig. 8 shows the results of performing wind noise detection and suppression. Fig. 8 is a diagram illustrating the processing effect of detecting and suppressing wind noise according to an embodiment of the present invention. Wherein, (a) is a time domain plot of speech contaminated by wind noise; (b) is a speech spectrogram polluted by wind noise; (c) a voice time domain graph after wind noise suppression; (d) and a voice spectrogram for wind noise suppression. The first 5 seconds are speech and the last 5 seconds are wind noise. As can be seen from fig. 8, the speech signal is preserved and the wind noise is effectively suppressed.
It will be understood by those skilled in the art that all or part of the steps in the method for implementing the above embodiments may be implemented by a program, and the program may be stored in a computer-readable storage medium, where the storage medium is a non-transitory medium, such as a random access memory, a read only memory, a flash memory, a hard disk, a solid state disk, a magnetic tape (magnetic tape), a floppy disk (floppy disk), an optical disk (optical disk), and any combination thereof.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.
Cited documents:
[1]Loizou P C.Speech enhancement:theory and practice[M].CRC press,2013.
[2]Ward D,Brandstein M.Microphone arrays:signal processing techniquesand applications[J].2001.
Schmidt M N,Larsen J,Hsiao F T.“Wind noise reduction using non-negative sparse coding”,in IEEE Workshop on Machine Learning for SignalProcessing,Aug.27-29,2007.
Claims (16)
1. a method of wind noise detection, the method comprising:
receiving speech and/or audio signals from M microphones, where M is an integer greater than 1;
performing complex phase interference function estimation on voice and/or audio signals to obtain complex phase interference function values of kth frequency band short-time spectrums of the ith microphone and the jth microphone, wherein l and k are natural numbers, i is 1,2,. M, j is 1,2,. M;
obtaining the sound source positioning angles of the ith microphone and the jth microphone in the ith frame according to the complex coherence function value;
determining the variance of the sound source positioning angle in the ith frame according to the sound source positioning angle;
and determining whether wind noise exists in the ith frame according to the sound source positioning angle variance.
2. The method of claim 1, wherein the complex coherence function values include a first complex coherence function value obtained by performing a long term complex coherence function estimation using a first smoothing factor and a second complex coherence function value obtained by performing an instantaneous complex coherence function estimation using a second smoothing factor, the first smoothing factor being greater than the second smoothing factor; the method comprises the steps of determining a complex interference function digital-to-analog square mean value of the ith frame according to at least one of the first complex interference function value and the second complex interference function value; the determining whether wind noise exists in the ith frame according to the sound source positioning angle variance comprises determining whether wind noise exists in the ith frame according to the sound source positioning angle variance by means of a complex interference function digital-to-analog square mean value of the ith frame.
3. The method of claim 2 wherein the first smoothing factor α1∈[0.7 0.9]Second smoothing factor α2∈[0.4 0.6]。
4. The method of claim 1, wherein the method further comprises the step of applying a voltage to the substrate
Obtaining sound source positioning angles of the ith microphone and the jth microphone in the ith frame according to the complex coherence function value, wherein the sound source positioning angles of the ith microphone and the jth microphone in the kth sub-band of the ith frame are determined; the kth sub-band is formed by combining at least one frequency band;
determining the sound source positioning angle variance in the l frame according to the sound source positioning angle, wherein the step of determining the sound source positioning angle variance in the k sub-band of the l frame according to the sound source positioning angles of the ith microphone and the jth microphone in the k sub-band of the l frame comprises the step of determining the sound source positioning angle variance in the k sub-band of the l frame; determining whether wind noise exists in the ith frame according to the sound source positioning angle variance comprises determining whether wind noise exists in the kth sub-band of the ith frame according to the sound source positioning angle variance of the kth sub-band of the ith frame.
5. The method of claim 1, wherein M-2;
obtaining sound source positioning angles of the ith microphone and the jth microphone in the ith frame according to the complex coherence function value, wherein the sound source positioning angles of the 2 microphones in the kth sub-band of the ith frame are determined; the kth sub-band is formed by combining at least one frequency band;
determining the sound source localization angle variance at the l frame according to the sound source localization angle comprises determining the sound source localization angle variance at the l frame according to the sound source localization angle of the 2 microphones at the k subband of the l frame.
6. A method of wind noise suppression, the method comprising:
receiving speech and/or audio signals from M microphones, where M is an integer greater than 1;
performing complex phase interference function estimation on voice and/or audio signals to obtain complex phase interference function values of kth frequency band short-time spectrums of the ith microphone and the jth microphone, wherein l and k are natural numbers, i is 1,2,. M, j is 1,2,. M;
obtaining the sound source positioning angles of the ith microphone and the jth microphone in the ith frame according to the complex coherence function value;
determining the variance of the sound source positioning angle in the ith frame according to the sound source positioning angle;
determining a wind noise gain function of the l frame according to the sound source positioning angle variance of the l frame;
and suppressing wind noise existing in the voice and/or audio signals according to the wind noise gain function of the ith frame.
7. The method of claim 6, wherein the complex coherence function values include a first complex coherence function value obtained by performing a long term complex coherence function estimation using a first smoothing factor and a second complex coherence function value obtained by performing an instantaneous complex coherence function estimation using a second smoothing factor, the first smoothing factor being greater than the second smoothing factor; the method comprises the steps of determining a complex interference function digital-to-analog square mean value of the ith frame according to at least one of the first complex interference function value and the second complex interference function value; the determining whether wind noise exists in the ith frame according to the sound source positioning angle variance comprises determining whether wind noise exists in the ith frame according to the sound source positioning angle variance by means of a complex interference function digital-to-analog square mean value of the ith frame.
8. The method of claim 7 wherein the first smoothing factor α1∈[0.7 0.9]Second smoothing factor α2∈[0.4 0.6]。
9. The method of claim 6, wherein the method further comprises the step of determining a target value of the target value
Obtaining sound source positioning angles of the ith microphone and the jth microphone in the ith frame according to the complex coherence function value, wherein the sound source positioning angles of the ith microphone and the jth microphone in the kth sub-band of the ith frame comprise sound source positioning angles; the kth sub-band is formed by combining at least one frequency band;
determining the sound source positioning angle variance in the l frame according to the sound source positioning angle, wherein the step of determining the sound source positioning angle variance in the k sub-band of the l frame according to the sound source positioning angles of the ith microphone and the jth microphone in the k sub-band of the l frame comprises the step of determining the sound source positioning angle variance in the k sub-band of the l frame; determining whether wind noise exists in the ith frame according to the sound source positioning angle variance comprises determining whether wind noise exists in the kth sub-band of the ith frame according to the sound source positioning angle variance of the kth sub-band of the ith frame.
10. The method of claim 6, wherein M-2;
obtaining sound source positioning angles of the ith microphone and the jth microphone in the ith frame according to the complex coherence function value, wherein the sound source positioning angles of the 2 microphones in the kth sub-band of the ith frame are determined; the kth sub-band is formed by combining at least one frequency band;
determining the sound source localization angle variance at the l frame according to the sound source localization angle comprises determining the sound source localization angle variance at the l frame according to the sound source localization angle of the 2 microphones at the k subband of the l frame.
11. The method of claim 6, wherein the method includes estimating a harmonic-to-noise ratio at the I frame, and determining a first speech protection gain function at the I frame based on the harmonic-to-noise ratio at the I frame; the suppressing the wind noise present in the speech and/or audio signal according to the wind noise gain function of the ith frame may include suppressing the wind noise present in the speech and/or audio signal in conjunction with the first speech protection gain function of the ith frame.
12. The method of claim 6, wherein the method includes estimating a ratio of high and low frequency energy at an l-th frame, and determining a second speech protection gain function at the l-th frame based on the ratio of high and low frequency energy at the l-th frame; the suppressing the wind noise present in the speech and/or audio signal according to the wind noise gain function of the ith frame may include suppressing the wind noise present in the speech and/or audio signal in conjunction with a second speech protection gain function of the ith frame.
13. A wind noise detection apparatus, the apparatus comprising:
a receiving module configured to receive speech and/or audio signals from M microphones, wherein M is an integer greater than 1;
an estimation module, configured to perform complex coherence function estimation on a speech and/or audio signal to obtain complex coherence function values of kth frequency band short-time spectrums of an ith frame of an ith microphone and a jth microphone, where l and k are natural numbers, i is 1,2,. M, j is 1,2,. M;
the angle calculation module is configured to obtain the sound source positioning angles of the ith microphone and the jth microphone in the l frame according to the complex coherence function value;
an angle variance determining module configured to determine a sound source localization angle variance at the l-th frame according to the sound source localization angle;
and the wind noise determining module is configured for determining whether wind noise exists in the ith frame according to the sound source positioning angle variance.
14. A wind noise suppression apparatus, characterized in that the apparatus comprises:
a receiving module receiving speech and/or audio signals from M microphones, wherein M is an integer greater than 1;
an estimation module, configured to perform complex coherence function estimation on speech and/or audio to obtain complex coherence function values of a kth frequency band short-time spectrum of an ith frame of an ith microphone and a jth microphone, where l and k are natural numbers, i is 1,2,. M, j is 1,2,. M;
the angle calculation module is configured to obtain the sound source positioning angles of the ith microphone and the jth microphone in the l frame according to the complex coherence function value;
an angle variance determining module configured to determine a sound source localization angle variance at the l-th frame according to the sound source localization angle;
the wind noise gain determining module is configured to determine a wind noise gain function of the l frame according to the sound source positioning angle variance;
and the suppression module is configured to suppress wind noise existing in the voice and/or the audio signal according to the wind noise gain function of the ith frame.
15. A computer-readable storage medium comprising instructions that, when executed on a computer, cause the computer to perform the method of any of claims 1 to 12.
16. A computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of any one of claims 1 to 12.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810935974.7A CN109215677B (en) | 2018-08-16 | 2018-08-16 | Wind noise detection and suppression method and device suitable for voice and audio |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810935974.7A CN109215677B (en) | 2018-08-16 | 2018-08-16 | Wind noise detection and suppression method and device suitable for voice and audio |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109215677A CN109215677A (en) | 2019-01-15 |
CN109215677B true CN109215677B (en) | 2020-09-29 |
Family
ID=64989091
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810935974.7A Active CN109215677B (en) | 2018-08-16 | 2018-08-16 | Wind noise detection and suppression method and device suitable for voice and audio |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109215677B (en) |
Families Citing this family (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB201818959D0 (en) | 2018-11-21 | 2019-01-09 | Nokia Technologies Oy | Ambience audio representation and associated rendering |
GB201902812D0 (en) | 2019-03-01 | 2019-04-17 | Nokia Technologies Oy | Wind noise reduction in parametric audio |
CN111223493B (en) * | 2020-01-08 | 2022-08-02 | 北京声加科技有限公司 | Voice signal noise reduction processing method, microphone and electronic equipment |
CN111833890B (en) * | 2020-07-13 | 2023-07-25 | 北京声加科技有限公司 | Device and method for automatically detecting wearing state of helmet |
CN112309420B (en) * | 2020-10-30 | 2023-06-27 | 出门问问(苏州)信息科技有限公司 | Method and device for detecting wind noise |
CN112309418B (en) * | 2020-10-30 | 2023-06-27 | 出门问问(苏州)信息科技有限公司 | Method and device for inhibiting wind noise |
CN112242148B (en) * | 2020-11-12 | 2023-06-16 | 北京声加科技有限公司 | Headset-based wind noise suppression method and device |
CN112802486B (en) * | 2020-12-29 | 2023-02-14 | 紫光展锐(重庆)科技有限公司 | Noise suppression method and device and electronic equipment |
CN112884975A (en) * | 2021-01-22 | 2021-06-01 | 李习平 | Scenic spot commodity selling system based on solar street lamp |
CN113380266B (en) * | 2021-05-28 | 2022-06-28 | 中国电子科技集团公司第三研究所 | Miniature dual-microphone speech enhancement method and miniature dual-microphone |
US11670326B1 (en) * | 2021-06-29 | 2023-06-06 | Amazon Technologies, Inc. | Noise detection and suppression |
CN113707170A (en) * | 2021-08-30 | 2021-11-26 | 展讯通信(上海)有限公司 | Wind noise suppression method, electronic device, and storage medium |
CN115691556B (en) * | 2023-01-03 | 2023-03-14 | 北京睿科伦智能科技有限公司 | Method for detecting multichannel voice quality of equipment side |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1530928A (en) * | 2003-02-21 | 2004-09-22 | 哈曼贝克自动系统-威美科公司 | System for inhibitting wind noise |
WO2013187946A2 (en) * | 2012-06-10 | 2013-12-19 | Nuance Communications, Inc. | Wind noise detection for in-car communication systems with multiple acoustic zones |
CN104157295A (en) * | 2014-08-22 | 2014-11-19 | 中国科学院上海高等研究院 | Method used for detecting and suppressing transient noise |
CN105792071A (en) * | 2011-02-10 | 2016-07-20 | 杜比实验室特许公司 | System and method for wind detection and suppression |
CN106161751A (en) * | 2015-04-14 | 2016-11-23 | 电信科学技术研究院 | A kind of noise suppressing method and device |
CN106448693A (en) * | 2016-09-05 | 2017-02-22 | 华为技术有限公司 | Speech signal processing method and apparatus |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8433564B2 (en) * | 2009-07-02 | 2013-04-30 | Alon Konchitsky | Method for wind noise reduction |
-
2018
- 2018-08-16 CN CN201810935974.7A patent/CN109215677B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1530928A (en) * | 2003-02-21 | 2004-09-22 | 哈曼贝克自动系统-威美科公司 | System for inhibitting wind noise |
CN105792071A (en) * | 2011-02-10 | 2016-07-20 | 杜比实验室特许公司 | System and method for wind detection and suppression |
WO2013187946A2 (en) * | 2012-06-10 | 2013-12-19 | Nuance Communications, Inc. | Wind noise detection for in-car communication systems with multiple acoustic zones |
CN104157295A (en) * | 2014-08-22 | 2014-11-19 | 中国科学院上海高等研究院 | Method used for detecting and suppressing transient noise |
CN106161751A (en) * | 2015-04-14 | 2016-11-23 | 电信科学技术研究院 | A kind of noise suppressing method and device |
CN106448693A (en) * | 2016-09-05 | 2017-02-22 | 华为技术有限公司 | Speech signal processing method and apparatus |
Also Published As
Publication number | Publication date |
---|---|
CN109215677A (en) | 2019-01-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109215677B (en) | Wind noise detection and suppression method and device suitable for voice and audio | |
CN111418010B (en) | Multi-microphone noise reduction method and device and terminal equipment | |
US8898058B2 (en) | Systems, methods, and apparatus for voice activity detection | |
CN104157295B (en) | For detection and the method for transient suppression noise | |
CN102938254B (en) | Voice signal enhancement system and method | |
JP5007442B2 (en) | System and method using level differences between microphones for speech improvement | |
CN103718241A (en) | Noise suppression device | |
Fingscheidt et al. | Environment-optimized speech enhancement | |
CN110085246A (en) | Sound enhancement method, device, equipment and storage medium | |
CN104835503A (en) | Improved GSC self-adaptive speech enhancement method | |
US20180308503A1 (en) | Real-time single-channel speech enhancement in noisy and time-varying environments | |
US11594239B1 (en) | Detection and removal of wind noise | |
CN106653004B (en) | Speaker identification feature extraction method for sensing speech spectrum regularization cochlear filter coefficient | |
Yousefian et al. | Using power level difference for near field dual-microphone speech enhancement | |
Martín-Doñas et al. | Dual-channel DNN-based speech enhancement for smartphones | |
Wu et al. | A study on target feature activation and normalization and their impacts on the performance of DNN based speech dereverberation systems | |
Xiong et al. | Spectro-Temporal SubNet for Real-Time Monaural Speech Denoising and Dereverberation. | |
Kim | Signal processing for robust speech recognition motivated by auditory processing | |
CN106997768A (en) | A kind of computational methods, device and the electronic equipment of voice probability of occurrence | |
CN116106826A (en) | Sound source positioning method, related device and medium | |
WO2019205797A1 (en) | Noise processing method, apparatus and device | |
Prasad et al. | Two microphone technique to improve the speech intelligibility under noisy environment | |
Dionelis | On single-channel speech enhancement and on non-linear modulation-domain Kalman filtering | |
Wang et al. | Speech enhancement based on perceptually motivated guided spectrogram filtering | |
Zhang et al. | A robust speech enhancement method based on microphone array |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |