CN109215677B

CN109215677B - Wind noise detection and suppression method and device suitable for voice and audio

Info

Publication number: CN109215677B
Application number: CN201810935974.7A
Authority: CN
Inventors: 邱锋海; 匡敬辉
Original assignee: Beijing Sound+ Technology Co ltd
Current assignee: Beijing Sound+ Technology Co ltd
Priority date: 2018-08-16
Filing date: 2018-08-16
Publication date: 2020-09-29
Anticipated expiration: 2038-08-16
Also published as: CN109215677A

Abstract

The invention provides a wind noise detection and suppression method. In one embodiment, the method comprises: obtaining the sound source positioning angle of each two of a plurality of microphones in each speech frame by carrying out complex phase interference function estimation on the speech or the audio, and further obtaining the sound source positioning angle variance of the speech frame; and determining whether wind noise exists in the voice frame according to the sound source positioning angle variance. Experiments prove that the method can detect and restrain strong non-steady wind noise in real time, and meanwhile, has no obvious voice and audio distortion.

Description

Wind noise detection and suppression method and device suitable for voice and audio

Technical Field

The present application relates to the field of noise processing, and in particular, to a method and an apparatus for detecting and suppressing a wind noise in real time for voice and audio.

Background

Currently, the conventional single-channel speech enhancement method assumes the stationary characteristic of noise, so that the stationary noise power spectrum can be estimated by a noise estimation method, and finally the stationary noise is suppressed [1 ]. The multi-channel speech enhancement method can utilize the spatial separation degree of target speech and interference noise to carry out spatial filtering by means of beam forming and the like, and realize the noise suppression of a stable state and a non-stable state [2 ]. The current deep learning-based method has a large amount of calculation, and the environmental adaptability and the generalization capability are yet to be further improved.

Wind noise belongs to strong non-steady state noise, and meanwhile, wind noise does not belong to a point sound source and has no obvious directivity, so that effective wind noise detection and suppression cannot be realized by adopting a traditional single-channel voice enhancement method or a multi-channel voice enhancement method such as spatial filtering. In recent years, some scholars propose to realize wind noise detection and suppression by a deep learning method or a non-negative matrix factorization method [3], and the methods have large calculation amount and generally have difficult guarantee of real-time performance, and particularly have no universal applicability to the application requirement of real-time low-power-consumption communication.

Disclosure of Invention

In a first aspect, an embodiment of the present invention provides a wind noise detection method. The method comprises the following steps: receiving speech and/or audio signals from M microphones, where M is an integer greater than 1; performing complex phase interference function estimation on voice and/or audio signals to obtain complex phase interference function values of K-th frequency band short-time spectrums of ith and jth frames of microphones, wherein l and K are natural numbers, i is 1,2,. M, j is 1,2,. M, K is 1,2, …, K; obtaining the sound source positioning angles of the ith microphone and the jth microphone in the ith frame according to the complex coherence function value; determining the sound source positioning angle variance in the l frame according to the sound source positioning angles of the M microphones in the l frame; and determining whether wind noise exists in the ith frame according to the sound source positioning angle variance.

In a second aspect, a wind noise suppression method is provided. The method comprises the following steps: receiving speech and/or audio signals from M microphones, where M is an integer greater than 1; performing complex phase interference function estimation on voice and/or audio signals to obtain complex phase interference function values of kth frequency band short-time spectrums of the ith microphone and the jth microphone, wherein l and k are natural numbers, i is 1,2,. M, j is 1,2,. M; obtaining the sound source positioning angles of the ith microphone and the jth microphone in the l frame according to the complex coherence function value; determining the variance of the sound source positioning angle in the l frame according to the sound source positioning angle; determining a wind noise gain function in the l frame according to the sound source positioning angle variance; and suppressing wind noise existing in the voice and/or audio signals according to the wind noise gain function of the ith frame.

In a third aspect, a wind noise detection apparatus is provided. The wind noise detection device includes: a receiving module configured to receive speech and/or audio signals from M microphones, wherein M is an integer greater than 1; an estimation module, configured to perform complex coherence function estimation on a speech and/or audio signal to obtain complex coherence function values of kth frequency band short-time spectrums of an ith frame of an ith microphone and a jth microphone, where l and k are natural numbers, i is 1,2,. M, j is 1,2,. M; the angle calculation module is configured to obtain the sound source positioning angles of the ith microphone and the jth microphone in the l frame according to the complex coherence function value; an angle variance determining module configured to determine a sound source localization angle variance at the l-th frame according to the sound source localization angle; and the wind noise determining module is configured for determining whether wind noise exists in the ith frame according to the sound source positioning angle variance.

In a fourth aspect, a wind noise suppression device is provided. The wind noise suppression device includes: a receiving module receiving speech and/or audio signals from M microphones, wherein M is an integer greater than 1; an estimation module, configured to perform complex coherence function estimation on speech and/or audio to obtain complex coherence function values of a kth frequency band short-time spectrum of an ith frame of an ith microphone and a jth microphone, where l and k are natural numbers, i is 1,2,. M, j is 1,2,. M; the angle calculation module is configured to obtain the sound source positioning angles of the ith microphone and the jth microphone in the l frame according to the complex coherence function value; an angle variance determining module configured to determine a sound source localization angle variance at the l-th frame according to the sound source localization angle; the wind noise gain determining module is configured to determine a wind noise gain function of the l frame according to the sound source positioning angle variance; and the suppression module is configured to suppress wind noise existing in the voice and/or the audio signal according to the wind noise gain function of the ith frame.

In a fifth aspect, a computer-readable storage medium is provided. The medium comprises instructions which, when run on a computer, cause the computer to perform the method according to the first or second aspect.

In a sixth aspect, a computer program product containing instructions is provided. When run on a computer, cause the computer to perform a method according to the first or second aspect.

According to the embodiment of the invention, the effective suppression of wind noise is realized by carrying out the steps of wind noise detection and wind noise suppression on the voice and/or audio signals, and meanwhile, the distortion of voice and audio is avoided to the maximum extent, and the voice quality and the audio quality are ensured. Experiments prove that the method can detect and restrain strong non-steady wind noise in real time, and meanwhile, has no obvious voice and audio distortion.

Drawings

FIG. 1 is a schematic diagram of a signal model;

FIG. 2 is a schematic block diagram of a multi-microphone wind noise suppression according to an embodiment of the present invention;

FIG. 3 is a diagram of wind noise and the digital-to-square comparison of the directional target speech complex coherence function;

FIG. 4 is a schematic illustration of wind noise source localization and directional target voice sound source localization angle over time;

FIG. 5 is a schematic block diagram of a full-band wind noise detection based on complex phase interference function digital-to-analog squared sum phase;

FIG. 6 is a schematic diagram of a sub-molecular wind noise detection based on complex phase interference function digital-to-analog square sum phase;

FIG. 7 is a schematic diagram of a gain function protection strategy based on speech harmonic characteristics and high and low frequency energy ratios;

FIG. 8 is a graph illustrating the processing of wind noise detection and suppression using an embodiment of the present invention; wherein (a) is a time domain map of speech contaminated by wind noise; (b) is a speech spectrogram polluted by wind noise; (c) is a speech time domain graph after wind noise suppression; (d) is a speech spectrogram with wind noise suppression.

Detailed Description

The embodiment of the invention provides a low-algorithm complexity real-time wind noise detection and suppression method, which is suitable for voice and audio signals, can be applied to a real-time voice and audio communication system, and can also be applied to non-real-time voice and audio signal enhancement. The embodiment of the invention obtains the sound source positioning angle of each two of a plurality of microphones in each speech frame by carrying out complex phase interference function estimation on the speech or the audio, and further obtains the sound source positioning angle variance of the speech frame; and determining whether wind noise exists in the voice frame according to the sound source positioning angle variance.

In another embodiment of the present invention, a wind noise gain function at a speech or audio frame may be determined based on a sound source localization angle variance at the speech or audio frame; wind noise present in the speech and/or audio signal is then suppressed according to a wind noise gain function for the speech or audio frame.

When complex phase interference function estimation is carried out, if a smoothing factor is large, the characteristic of instantaneous change is easy to be blurred, for example, distortion can be caused in the process of increasing energy coefficients such as a voice initial segment; whereas if the smoothing factor is too small, it is easy to cause the valid speech to be suppressed by wind noise in a portion where the energy is weak, such as an end segment of speech. To this end, in one embodiment, a dual or multiple smoothing factor, or an adaptive smoothing factor, may be employed for complex coherence function estimation. The complex coherence function estimate may be used to determine a complex coherence function digital-to-analog squared average of the speech or audio frame, thereby assisting wind noise detection and suppression.

In one embodiment, a sub-band wind noise detection strategy may be employed to combine at least one of the K frequency bands into a sub-band, and then determine the sound source localization angles and sound source localization angle variances of the M microphones two by two with respect to each other at multiple sub-bands of the speech or audio frame. Thereby, the wind noise is detected and suppressed.

In one embodiment, the harmonic-to-noise ratio at L frames is used for voiced sound protection in view of the harmonic characteristics of voiced sounds.

In one embodiment, in consideration of the characteristics of strong wind noise low-frequency energy and strong unvoiced high-frequency energy, the unvoiced sound can be protected by adopting high-low frequency energy.

The present invention will be described below with reference to specific examples.

Fig. 1 is a schematic diagram of a signal model. In fig. 1, a variety of sound signals are illustrated, including a target speech signal, a directional interferer, stationary noise, and wind noise. By way of example, the speech environment includes speakers A and B, both of which are speaking. Assuming that the voice of the speaker a is a target of the voice processing, the voice signal of the speaker B constitutes a directional interference noise as a target voice signal. In the same speech environment there may also be noise, such as from a car, and possibly from wind induced noise. Automotive noise represents a type of steady state noise. Wind noise has characteristics that are different from directional interference noise and stationary noise, as will be discussed below.

In order to collect a target speech signal, M microphones are provided in the speech environment. Suppose that the signal x received by the ith microphone_i(n) is:

x_i(n)＝s_i(n)+d_t,i(n)+d_s,i(n)+d_w,i(n) (1)

wherein s is_i(n)、d_t,i(n)、d_s,i(n) and d_w,i(n) target speech or audio respectively received by the i-th microphone (for convenience of description)Hereinafter simply referred to as speech) signals, directional interferers, stationary noise, and wind noise; 1,2, M, where M is the number of microphones.

In the case of multiple microphones with M > 1, for directional interference noise d_t,i(n) effective suppression can be achieved by beam forming means; for environmental noise, the suppression can be realized by a post-filtering method, and then the wind noise suppression method provided by the embodiment of the invention is combined to effectively suppress the wind noise.

Fig. 2 is a schematic block diagram of the wind noise suppression principle of a multi-microphone. As shown in fig. 2, the wind noise suppression includes beamforming 21, post-processing 22, wind noise detection (23-24), wind noise gain estimation 25, and so on. In view of the protection of voiced and unvoiced sounds, harmonic noise ratio and high-low energy ratio estimation 26 may also be performed, and further wind noise gain function estimation 27 based on speech characteristics may be performed.

In the case of M > 1 multi-microphones, the speech signals x picked up by M microphones_i(n) (i ═ 1,2 … M), the directional interference noise d can be achieved by using beam forming means_t,i(n) effective suppression. The beamforming may employ a fixed beamforming method or an adaptive beamforming method. The fixed Beamforming method includes Delay-and-Sum Beamforming (DSB), Delay-and-Filtering (DFB), and Robust super-directional Beamforming (RSB). The adaptive beamforming algorithm may include a Generalized Sidelobe suppression method (GSC), a Minimum Variance Distortionless Response Method (MVDR), a Multi-channel Wiener Filtering Method (MWF), and the like.

In the post-processing step, the signal after the beam forming can be filtered by adopting a filtering method to eliminate the environmental noise. The multi-channel post-filtering method can adopt a coherence-based method, an energy-based method or a combination of the two methods, and the like, and can also adopt a spectral subtraction method, a subspace method and the like for the purpose of steady-state noise suppression. In one example, the residual directional noise and stationary noise are preprocessed using spectral subtraction in consideration of stability and computation amount.

The output signal of the signals received by the multiple microphones after beam forming and post filtering is x (n), and the stable state noise residue is not considered:

x(n)＝s(n)+d_w(n) (2)

where s (n) is the estimated target speech component, d_w(n) is the residual wind noise. (2) The frequency domain of formula is represented as:

X(k,l)＝S(k,l)+D_w(k,l) (3)

wherein X (k, l), S (k, l) and D_w(k, l) are x (n), s (n) and d, respectively_wThe kth frequency band short-time spectrum of the (n) th frame can be realized by Fast Fourier Transform (FFT). Similarly, the signal x is received for the ith microphone_i(n) performing FFT with the corresponding k-th band short-time spectrum of the l-th frame as X_i(k,l)。

After beamforming and post-processing, the signal is wind noise detected. Fig. 5 is a schematic block diagram of full-band wind noise detection based on the digital-to-analog square sum phase of complex interference function.

First, a double smoothing factor complex coherence function estimation is performed. The strong non-stationary characteristic and the strong energy characteristic of wind noise are considered, a method for estimating the complex coherent function by using double smoothing factors is provided, and the influence of the wind noise on weak voice and audio is avoided.

The traditional complex coherence function estimation method adopts a fixed smoothing factor, namely:

wherein, C_ij(k, l) is complex coherence function value of kth frequency band short-time spectrum of ith frame of ith microphone and jth microphone, which is small quantity greater than 0, avoiding zero-division operation; l, k are natural numbers, i is 1,2,. M, j is 1,2,. M.

Self-power spectrum and cross-power spectrum: when i ═ j, R_ij(k,l)＝R_ii(k,l)＝R_jj(k, l) is self powerWhen i ≠ j, it is cross-power spectrum α is a fixed smoothing factor, generally ranging from 0 to 1. to reduce the estimation bias, α should be close to 1, but this will cause distortion of the speech and audio initial segments, especially when the wind noise energy is high, it will cause distortion of the speech and audio signals for a long time.

In one embodiment, a fixed smoothing factor problem is solved using a double smoothing factor strategy, where a large smoothing factor α₁∈[0.7 0.9]For estimating long-term complex coherence function estimation, small smoothing factor α₂∈[0.4 0.6]For estimating the instantaneous complex coherence function. The long-term complex phase dry function and the instantaneous complex phase dry function are respectively

And

all adopt the formula (4) calculation except that

With a large smoothing factor α₁And a small smoothing factor α₂The substitution is made. Of course, it is also possible to use three or more smoothing factors to estimate the complex coherence function multiple times for the purpose of protecting the speech and audio signals, which is a simple extension and expansion of the dual-smoothing-factor strategy and should also fall within the scope of the present patent.

Further, in one embodiment, an adaptive leveling slip factor α may be employed. For example, the smoothing factor α is adaptively adjusted according to the stationarity of the signal picked up by the microphone, the absolute value of the complex coherence function, or the signal wind-to-noise ratio, so as to achieve the purpose of protecting the voice and audio signals, which is also protected by the present patent.

And then, carrying out full-band wind noise detection based on the digital-to-analog square sum phase of the complex phase interference function.

The wind noise characteristics are different from the directional target voice and audio sound source, and are embodied as follows: first, the complex coherence function digital-to-analog square of the wind noise between different microphones is close to 0, while the complex coherence function digital-to-analog square of the directional target speech audio is close to 1. Fig. 3 presents a graph of the complex coherence function digital-to-analog squared of wind noise and a directional target speech sound source, which verifies the difference in wind noise characteristics from the directional target speech and audio sound source. Secondly, wind noise does not belong to a directional sound source, and adjacent frames of sound source positioning angles are irregular; the directional sound source can find the direction of the point sound source by a sound source positioning method, and the inter-frame sound source positioning angle has continuity; and the environmental noise close to the diffusion field has the zero phase characteristic and also shows the good inter-frame continuity characteristic of the sound source positioning angle. FIG. 4 shows the results of the wind noise source localization and the directional target voice sound source localization angle over time, with the first 5 seconds target voice sound source localization at 0 degrees and the source localization angle jitter only in the silent segment; and in the data of the 6 th to 10 th seconds in which wind noise exists, the sound source localization angle appears to be greatly jumped. Wind noise can be detected according to the continuity of the sound source azimuth, and sound source angle variance calculation can be carried out by combining multiple frames; when the variance exceeds a certain threshold, the current frame can be preliminarily judged as wind noise.

And further combining the digital-to-analog square characteristic of the complex phase interference function, detecting wind noise by a double-threshold judgment mode, and the schematic block diagram is shown in figure 5. The method comprises the following specific steps:

1) performing pairwise microphone sound source positioning based on a complex coherence function by adopting a GCC-PHAT (Generalized Cross-correlation Phase Transform) method or a SCOT-PHAT (Standard Cross-correlation Phase Transform) method and the like, and assuming that the sound source positioning angle of the ith microphone and the jth microphone in the first frame is theta_ij(l) (ii) a All or part of the frequency band of the complex coherence function may be used for sound source localization.

2) Calculating the sound source localization angle variance at the l-th frame, i.e., M (M-1)/2 groups θ_ij(l) Variance:

wherein,

3) calculating the mean value of the square of each frame complex phase interference function digital model:

wherein k is_lowAnd k_upLower and upper limits of the frequency band, k, respectively_lowMinimum value of 0, k_upThe maximum value is 1/2 of the FFT frame length;

4) performing wind noise detection according to the complex phase interference function digital-analog square mean value and the sound source positioning angle variance, wherein the detection basis is as follows:

wherein,

and

for threshold, q (l) 1 indicates that the l-th frame is wind noise.

The complex interference function digital-to-analog square mean value can assist wind noise detection based on the sound source positioning angle variance, and is beneficial to reducing voice distortion possibly brought by the complex interference function digital-to-analog square mean value.

Fig. 6 presents an alternative to fig. 5. The alternative is based on the detection of the complex phase interference function digital-to-analog squared sum phase with the wind noise in the molecule.

When the number of microphones is only two, that is, M is 2, the variance cannot be calculated by using equation (5), and equation (7) can be determined only by the complex phase interference function digital-analog square mean value. In order to solve the problem and reduce the voice distortion of a frequency band which is not influenced by wind noise, a molecular wind noise detection strategy is further provided by utilizing the characteristic that the positioning angle of a wind noise source between adjacent sub-bands is irregular, and the specific steps are as follows:

1) combining multiple frequency bands into a sub-band, e.g. every 500Hz, and performing sound source localization on each sub-band, wherein the sound source localization angle of each sub-band is theta_ij(kappa, l), wherein kappa denotes a subunitA pointer is arranged;

2) calculating the M (M-1)/2 group theta_ij(kappa, l) variance

Wherein,

when M is 2, there is only one set of θ₁₂(κ, l), where the variance is calculated as follows:

wherein,

κ_upis the subband upper bound. When M is 2, the wind noise detection sound source angle variance of equation (7) can be calculated by equations (8) and (9);

3) calculating the mean value C of the sub-band complex phase interference function digital-to-analog square^a(kappa, l) and C^b(κ, l) calculated similarly to equation (6) with only k being considered_lowAnd k_upI.e. k for each subband_lowAnd k_upCarrying out corresponding modification;

4) performing molecular band wind noise detection according to the sub-band complex phase dry function digital-to-analog square mean and the sub-band sound source angle variance, wherein the detection basis is as follows:

wherein,

and

for the threshold, q (k, l) ═ 1 indicates that the kth sub-band of the l-th frame is wind noise.

The sub-band combined wind noise detection scheme realizes strong suppression of wind noise and avoids distortion of wind noise-free frequency band voice and audio.

Under the condition that wind noise is detected, the harmonic noise ratio and the high-low frequency energy ratio can be combined to protect the voice, and therefore wind noise suppression is achieved.

Fig. 7 is a schematic diagram of a gain function protection strategy based on speech harmonic characteristics and high-low frequency energy ratio. In the upper part of fig. 7, the process of protecting voiced sounds with harmonic noise ratio is illustrated.

Voiced sounds have harmonic characteristics, and in order to extract the fundamental frequency of harmonics, a cepstrum-based method and a frequency domain-based method may be employed. The method comprises the following specific steps.

1) Firstly, calculating a power cepstrum coefficient c (lambda, l), wherein lambda is a cepstrum coefficient pointer;

2) judging whether the current frame is voiced according to c (lambda, l) according to the following judgment:

wherein, c_thIs a threshold, λ_minAnd λ_maxMinimum and maximum cepstrum frequency coefficient pointers corresponding to voiced fundamental frequencies;

3) b is a lambda of c (lambda, l)_minTo lambda_maxThe middle maximum coefficient is set to 0 and the harmonic noise ratio is estimated to be

HNR (k, l), the value of the first speech protection gain function in the kth band of the l frame is:

considering the long-term spectral characteristics, wind noise is generally stronger in low-frequency energy, and unvoiced high-frequency energy is generally stronger. In the lower part of fig. 7, the process of protecting unvoiced sound according to the energy ratio of high and low frequencies is illustrated. The method comprises the following specific steps.

1) Calculating high frequency energy

And low frequency energy

Wherein K_min，K_midAnd K_highThe minimum band, the middle band and the highest band for calculating the low and high frequency energy, respectively;

2) calculating a high-low frequency energy ratio PR (l) ═ P of each frame_high(l)/P_low(l)；

3) For unvoiced speech protection according to pr (l), the value of the second speech protection gain function in the kth band of the l frame is:

wherein, PR_thIs a threshold. G_p,2(k, l) takes a value of 1 only in the mid-high frequency band where unvoiced sounds may occur.

When full-band wind noise detection is employed, the wind noise gain function is G₁(k, l) ═ 1-q (l), i.e., 1-q (l) values are assigned to the gain function for all bands in the l-th frame. When the molecular band wind noise detection is adopted, the wind noise gain function is G₁(k, l) ═ 1-q (k, l), i.e., 1-q (k, l) values are assigned to the gain function of the kth subband in the l-th frame.

According to the two voice protection strategies, a final wind noise suppression gain function is formed as follows:

G_w(k,l)＝max{G₁(k,l),G_p,1(k,l),G_p,2(k,l)} (14)

the finally output enhanced voice time domain signal is obtained through inverse FFT and overlap addition, namely:

fig. 8 shows the results of performing wind noise detection and suppression. Fig. 8 is a diagram illustrating the processing effect of detecting and suppressing wind noise according to an embodiment of the present invention. Wherein, (a) is a time domain plot of speech contaminated by wind noise; (b) is a speech spectrogram polluted by wind noise; (c) a voice time domain graph after wind noise suppression; (d) and a voice spectrogram for wind noise suppression. The first 5 seconds are speech and the last 5 seconds are wind noise. As can be seen from fig. 8, the speech signal is preserved and the wind noise is effectively suppressed.

It will be understood by those skilled in the art that all or part of the steps in the method for implementing the above embodiments may be implemented by a program, and the program may be stored in a computer-readable storage medium, where the storage medium is a non-transitory medium, such as a random access memory, a read only memory, a flash memory, a hard disk, a solid state disk, a magnetic tape (magnetic tape), a floppy disk (floppy disk), an optical disk (optical disk), and any combination thereof.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Cited documents:

[1]Loizou P C.Speech enhancement:theory and practice[M].CRC press,2013.

[2]Ward D,Brandstein M.Microphone arrays:signal processing techniquesand applications[J].2001.

Schmidt M N,Larsen J,Hsiao F T.“Wind noise reduction using non-negative sparse coding”,in IEEE Workshop on Machine Learning for SignalProcessing,Aug.27-29,2007.

Claims

1. a method of wind noise detection, the method comprising:

receiving speech and/or audio signals from M microphones, where M is an integer greater than 1;

performing complex phase interference function estimation on voice and/or audio signals to obtain complex phase interference function values of kth frequency band short-time spectrums of the ith microphone and the jth microphone, wherein l and k are natural numbers, i is 1,2,. M, j is 1,2,. M;

obtaining the sound source positioning angles of the ith microphone and the jth microphone in the ith frame according to the complex coherence function value;

determining the variance of the sound source positioning angle in the ith frame according to the sound source positioning angle;

and determining whether wind noise exists in the ith frame according to the sound source positioning angle variance.

2. The method of claim 1, wherein the complex coherence function values include a first complex coherence function value obtained by performing a long term complex coherence function estimation using a first smoothing factor and a second complex coherence function value obtained by performing an instantaneous complex coherence function estimation using a second smoothing factor, the first smoothing factor being greater than the second smoothing factor; the method comprises the steps of determining a complex interference function digital-to-analog square mean value of the ith frame according to at least one of the first complex interference function value and the second complex interference function value; the determining whether wind noise exists in the ith frame according to the sound source positioning angle variance comprises determining whether wind noise exists in the ith frame according to the sound source positioning angle variance by means of a complex interference function digital-to-analog square mean value of the ith frame.

3. The method of claim 2 wherein the first smoothing factor α₁∈[0.7 0.9]Second smoothing factor α₂∈[0.4 0.6]。

4. The method of claim 1, wherein the method further comprises the step of applying a voltage to the substrate

Obtaining sound source positioning angles of the ith microphone and the jth microphone in the ith frame according to the complex coherence function value, wherein the sound source positioning angles of the ith microphone and the jth microphone in the kth sub-band of the ith frame are determined; the kth sub-band is formed by combining at least one frequency band;

determining the sound source positioning angle variance in the l frame according to the sound source positioning angle, wherein the step of determining the sound source positioning angle variance in the k sub-band of the l frame according to the sound source positioning angles of the ith microphone and the jth microphone in the k sub-band of the l frame comprises the step of determining the sound source positioning angle variance in the k sub-band of the l frame; determining whether wind noise exists in the ith frame according to the sound source positioning angle variance comprises determining whether wind noise exists in the kth sub-band of the ith frame according to the sound source positioning angle variance of the kth sub-band of the ith frame.

5. The method of claim 1, wherein M-2;

obtaining sound source positioning angles of the ith microphone and the jth microphone in the ith frame according to the complex coherence function value, wherein the sound source positioning angles of the 2 microphones in the kth sub-band of the ith frame are determined; the kth sub-band is formed by combining at least one frequency band;

determining the sound source localization angle variance at the l frame according to the sound source localization angle comprises determining the sound source localization angle variance at the l frame according to the sound source localization angle of the 2 microphones at the k subband of the l frame.

6. A method of wind noise suppression, the method comprising:

determining a wind noise gain function of the l frame according to the sound source positioning angle variance of the l frame;

and suppressing wind noise existing in the voice and/or audio signals according to the wind noise gain function of the ith frame.

7. The method of claim 6, wherein the complex coherence function values include a first complex coherence function value obtained by performing a long term complex coherence function estimation using a first smoothing factor and a second complex coherence function value obtained by performing an instantaneous complex coherence function estimation using a second smoothing factor, the first smoothing factor being greater than the second smoothing factor; the method comprises the steps of determining a complex interference function digital-to-analog square mean value of the ith frame according to at least one of the first complex interference function value and the second complex interference function value; the determining whether wind noise exists in the ith frame according to the sound source positioning angle variance comprises determining whether wind noise exists in the ith frame according to the sound source positioning angle variance by means of a complex interference function digital-to-analog square mean value of the ith frame.

8. The method of claim 7 wherein the first smoothing factor α₁∈[0.7 0.9]Second smoothing factor α₂∈[0.4 0.6]。

9. The method of claim 6, wherein the method further comprises the step of determining a target value of the target value

Obtaining sound source positioning angles of the ith microphone and the jth microphone in the ith frame according to the complex coherence function value, wherein the sound source positioning angles of the ith microphone and the jth microphone in the kth sub-band of the ith frame comprise sound source positioning angles; the kth sub-band is formed by combining at least one frequency band;

10. The method of claim 6, wherein M-2;

11. The method of claim 6, wherein the method includes estimating a harmonic-to-noise ratio at the I frame, and determining a first speech protection gain function at the I frame based on the harmonic-to-noise ratio at the I frame; the suppressing the wind noise present in the speech and/or audio signal according to the wind noise gain function of the ith frame may include suppressing the wind noise present in the speech and/or audio signal in conjunction with the first speech protection gain function of the ith frame.

12. The method of claim 6, wherein the method includes estimating a ratio of high and low frequency energy at an l-th frame, and determining a second speech protection gain function at the l-th frame based on the ratio of high and low frequency energy at the l-th frame; the suppressing the wind noise present in the speech and/or audio signal according to the wind noise gain function of the ith frame may include suppressing the wind noise present in the speech and/or audio signal in conjunction with a second speech protection gain function of the ith frame.

13. A wind noise detection apparatus, the apparatus comprising:

a receiving module configured to receive speech and/or audio signals from M microphones, wherein M is an integer greater than 1;

an estimation module, configured to perform complex coherence function estimation on a speech and/or audio signal to obtain complex coherence function values of kth frequency band short-time spectrums of an ith frame of an ith microphone and a jth microphone, where l and k are natural numbers, i is 1,2,. M, j is 1,2,. M;

the angle calculation module is configured to obtain the sound source positioning angles of the ith microphone and the jth microphone in the l frame according to the complex coherence function value;

an angle variance determining module configured to determine a sound source localization angle variance at the l-th frame according to the sound source localization angle;

and the wind noise determining module is configured for determining whether wind noise exists in the ith frame according to the sound source positioning angle variance.

14. A wind noise suppression apparatus, characterized in that the apparatus comprises:

a receiving module receiving speech and/or audio signals from M microphones, wherein M is an integer greater than 1;

an estimation module, configured to perform complex coherence function estimation on speech and/or audio to obtain complex coherence function values of a kth frequency band short-time spectrum of an ith frame of an ith microphone and a jth microphone, where l and k are natural numbers, i is 1,2,. M, j is 1,2,. M;

the wind noise gain determining module is configured to determine a wind noise gain function of the l frame according to the sound source positioning angle variance;

and the suppression module is configured to suppress wind noise existing in the voice and/or the audio signal according to the wind noise gain function of the ith frame.

15. A computer-readable storage medium comprising instructions that, when executed on a computer, cause the computer to perform the method of any of claims 1 to 12.

16. A computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of any one of claims 1 to 12.