[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN109215677B - Wind noise detection and suppression method and device suitable for voice and audio - Google Patents

Wind noise detection and suppression method and device suitable for voice and audio Download PDF

Info

Publication number
CN109215677B
CN109215677B CN201810935974.7A CN201810935974A CN109215677B CN 109215677 B CN109215677 B CN 109215677B CN 201810935974 A CN201810935974 A CN 201810935974A CN 109215677 B CN109215677 B CN 109215677B
Authority
CN
China
Prior art keywords
sound source
wind noise
ith
source positioning
microphone
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810935974.7A
Other languages
Chinese (zh)
Other versions
CN109215677A (en
Inventor
邱锋海
匡敬辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sound+ Technology Co ltd
Original Assignee
Beijing Sound+ Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sound+ Technology Co ltd filed Critical Beijing Sound+ Technology Co ltd
Priority to CN201810935974.7A priority Critical patent/CN109215677B/en
Publication of CN109215677A publication Critical patent/CN109215677A/en
Application granted granted Critical
Publication of CN109215677B publication Critical patent/CN109215677B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0264Noise filtering characterised by the type of parameter measurement, e.g. correlation techniques, zero crossing techniques or predictive techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

The invention provides a wind noise detection and suppression method. In one embodiment, the method comprises: obtaining the sound source positioning angle of each two of a plurality of microphones in each speech frame by carrying out complex phase interference function estimation on the speech or the audio, and further obtaining the sound source positioning angle variance of the speech frame; and determining whether wind noise exists in the voice frame according to the sound source positioning angle variance. Experiments prove that the method can detect and restrain strong non-steady wind noise in real time, and meanwhile, has no obvious voice and audio distortion.

Description

Wind noise detection and suppression method and device suitable for voice and audio
Technical Field
The present application relates to the field of noise processing, and in particular, to a method and an apparatus for detecting and suppressing a wind noise in real time for voice and audio.
Background
Currently, the conventional single-channel speech enhancement method assumes the stationary characteristic of noise, so that the stationary noise power spectrum can be estimated by a noise estimation method, and finally the stationary noise is suppressed [1 ]. The multi-channel speech enhancement method can utilize the spatial separation degree of target speech and interference noise to carry out spatial filtering by means of beam forming and the like, and realize the noise suppression of a stable state and a non-stable state [2 ]. The current deep learning-based method has a large amount of calculation, and the environmental adaptability and the generalization capability are yet to be further improved.
Wind noise belongs to strong non-steady state noise, and meanwhile, wind noise does not belong to a point sound source and has no obvious directivity, so that effective wind noise detection and suppression cannot be realized by adopting a traditional single-channel voice enhancement method or a multi-channel voice enhancement method such as spatial filtering. In recent years, some scholars propose to realize wind noise detection and suppression by a deep learning method or a non-negative matrix factorization method [3], and the methods have large calculation amount and generally have difficult guarantee of real-time performance, and particularly have no universal applicability to the application requirement of real-time low-power-consumption communication.
Disclosure of Invention
In a first aspect, an embodiment of the present invention provides a wind noise detection method. The method comprises the following steps: receiving speech and/or audio signals from M microphones, where M is an integer greater than 1; performing complex phase interference function estimation on voice and/or audio signals to obtain complex phase interference function values of K-th frequency band short-time spectrums of ith and jth frames of microphones, wherein l and K are natural numbers, i is 1,2,. M, j is 1,2,. M, K is 1,2, …, K; obtaining the sound source positioning angles of the ith microphone and the jth microphone in the ith frame according to the complex coherence function value; determining the sound source positioning angle variance in the l frame according to the sound source positioning angles of the M microphones in the l frame; and determining whether wind noise exists in the ith frame according to the sound source positioning angle variance.
In a second aspect, a wind noise suppression method is provided. The method comprises the following steps: receiving speech and/or audio signals from M microphones, where M is an integer greater than 1; performing complex phase interference function estimation on voice and/or audio signals to obtain complex phase interference function values of kth frequency band short-time spectrums of the ith microphone and the jth microphone, wherein l and k are natural numbers, i is 1,2,. M, j is 1,2,. M; obtaining the sound source positioning angles of the ith microphone and the jth microphone in the l frame according to the complex coherence function value; determining the variance of the sound source positioning angle in the l frame according to the sound source positioning angle; determining a wind noise gain function in the l frame according to the sound source positioning angle variance; and suppressing wind noise existing in the voice and/or audio signals according to the wind noise gain function of the ith frame.
In a third aspect, a wind noise detection apparatus is provided. The wind noise detection device includes: a receiving module configured to receive speech and/or audio signals from M microphones, wherein M is an integer greater than 1; an estimation module, configured to perform complex coherence function estimation on a speech and/or audio signal to obtain complex coherence function values of kth frequency band short-time spectrums of an ith frame of an ith microphone and a jth microphone, where l and k are natural numbers, i is 1,2,. M, j is 1,2,. M; the angle calculation module is configured to obtain the sound source positioning angles of the ith microphone and the jth microphone in the l frame according to the complex coherence function value; an angle variance determining module configured to determine a sound source localization angle variance at the l-th frame according to the sound source localization angle; and the wind noise determining module is configured for determining whether wind noise exists in the ith frame according to the sound source positioning angle variance.
In a fourth aspect, a wind noise suppression device is provided. The wind noise suppression device includes: a receiving module receiving speech and/or audio signals from M microphones, wherein M is an integer greater than 1; an estimation module, configured to perform complex coherence function estimation on speech and/or audio to obtain complex coherence function values of a kth frequency band short-time spectrum of an ith frame of an ith microphone and a jth microphone, where l and k are natural numbers, i is 1,2,. M, j is 1,2,. M; the angle calculation module is configured to obtain the sound source positioning angles of the ith microphone and the jth microphone in the l frame according to the complex coherence function value; an angle variance determining module configured to determine a sound source localization angle variance at the l-th frame according to the sound source localization angle; the wind noise gain determining module is configured to determine a wind noise gain function of the l frame according to the sound source positioning angle variance; and the suppression module is configured to suppress wind noise existing in the voice and/or the audio signal according to the wind noise gain function of the ith frame.
In a fifth aspect, a computer-readable storage medium is provided. The medium comprises instructions which, when run on a computer, cause the computer to perform the method according to the first or second aspect.
In a sixth aspect, a computer program product containing instructions is provided. When run on a computer, cause the computer to perform a method according to the first or second aspect.
According to the embodiment of the invention, the effective suppression of wind noise is realized by carrying out the steps of wind noise detection and wind noise suppression on the voice and/or audio signals, and meanwhile, the distortion of voice and audio is avoided to the maximum extent, and the voice quality and the audio quality are ensured. Experiments prove that the method can detect and restrain strong non-steady wind noise in real time, and meanwhile, has no obvious voice and audio distortion.
Drawings
FIG. 1 is a schematic diagram of a signal model;
FIG. 2 is a schematic block diagram of a multi-microphone wind noise suppression according to an embodiment of the present invention;
FIG. 3 is a diagram of wind noise and the digital-to-square comparison of the directional target speech complex coherence function;
FIG. 4 is a schematic illustration of wind noise source localization and directional target voice sound source localization angle over time;
FIG. 5 is a schematic block diagram of a full-band wind noise detection based on complex phase interference function digital-to-analog squared sum phase;
FIG. 6 is a schematic diagram of a sub-molecular wind noise detection based on complex phase interference function digital-to-analog square sum phase;
FIG. 7 is a schematic diagram of a gain function protection strategy based on speech harmonic characteristics and high and low frequency energy ratios;
FIG. 8 is a graph illustrating the processing of wind noise detection and suppression using an embodiment of the present invention; wherein (a) is a time domain map of speech contaminated by wind noise; (b) is a speech spectrogram polluted by wind noise; (c) is a speech time domain graph after wind noise suppression; (d) is a speech spectrogram with wind noise suppression.
Detailed Description
The embodiment of the invention provides a low-algorithm complexity real-time wind noise detection and suppression method, which is suitable for voice and audio signals, can be applied to a real-time voice and audio communication system, and can also be applied to non-real-time voice and audio signal enhancement. The embodiment of the invention obtains the sound source positioning angle of each two of a plurality of microphones in each speech frame by carrying out complex phase interference function estimation on the speech or the audio, and further obtains the sound source positioning angle variance of the speech frame; and determining whether wind noise exists in the voice frame according to the sound source positioning angle variance.
In another embodiment of the present invention, a wind noise gain function at a speech or audio frame may be determined based on a sound source localization angle variance at the speech or audio frame; wind noise present in the speech and/or audio signal is then suppressed according to a wind noise gain function for the speech or audio frame.
When complex phase interference function estimation is carried out, if a smoothing factor is large, the characteristic of instantaneous change is easy to be blurred, for example, distortion can be caused in the process of increasing energy coefficients such as a voice initial segment; whereas if the smoothing factor is too small, it is easy to cause the valid speech to be suppressed by wind noise in a portion where the energy is weak, such as an end segment of speech. To this end, in one embodiment, a dual or multiple smoothing factor, or an adaptive smoothing factor, may be employed for complex coherence function estimation. The complex coherence function estimate may be used to determine a complex coherence function digital-to-analog squared average of the speech or audio frame, thereby assisting wind noise detection and suppression.
In one embodiment, a sub-band wind noise detection strategy may be employed to combine at least one of the K frequency bands into a sub-band, and then determine the sound source localization angles and sound source localization angle variances of the M microphones two by two with respect to each other at multiple sub-bands of the speech or audio frame. Thereby, the wind noise is detected and suppressed.
In one embodiment, the harmonic-to-noise ratio at L frames is used for voiced sound protection in view of the harmonic characteristics of voiced sounds.
In one embodiment, in consideration of the characteristics of strong wind noise low-frequency energy and strong unvoiced high-frequency energy, the unvoiced sound can be protected by adopting high-low frequency energy.
The present invention will be described below with reference to specific examples.
Fig. 1 is a schematic diagram of a signal model. In fig. 1, a variety of sound signals are illustrated, including a target speech signal, a directional interferer, stationary noise, and wind noise. By way of example, the speech environment includes speakers A and B, both of which are speaking. Assuming that the voice of the speaker a is a target of the voice processing, the voice signal of the speaker B constitutes a directional interference noise as a target voice signal. In the same speech environment there may also be noise, such as from a car, and possibly from wind induced noise. Automotive noise represents a type of steady state noise. Wind noise has characteristics that are different from directional interference noise and stationary noise, as will be discussed below.
In order to collect a target speech signal, M microphones are provided in the speech environment. Suppose that the signal x received by the ith microphonei(n) is:
xi(n)=si(n)+dt,i(n)+ds,i(n)+dw,i(n) (1)
wherein s isi(n)、dt,i(n)、ds,i(n) and dw,i(n) target speech or audio respectively received by the i-th microphone (for convenience of description)Hereinafter simply referred to as speech) signals, directional interferers, stationary noise, and wind noise; 1,2, M, where M is the number of microphones.
In the case of multiple microphones with M > 1, for directional interference noise dt,i(n) effective suppression can be achieved by beam forming means; for environmental noise, the suppression can be realized by a post-filtering method, and then the wind noise suppression method provided by the embodiment of the invention is combined to effectively suppress the wind noise.
Fig. 2 is a schematic block diagram of the wind noise suppression principle of a multi-microphone. As shown in fig. 2, the wind noise suppression includes beamforming 21, post-processing 22, wind noise detection (23-24), wind noise gain estimation 25, and so on. In view of the protection of voiced and unvoiced sounds, harmonic noise ratio and high-low energy ratio estimation 26 may also be performed, and further wind noise gain function estimation 27 based on speech characteristics may be performed.
In the case of M > 1 multi-microphones, the speech signals x picked up by M microphonesi(n) (i ═ 1,2 … M), the directional interference noise d can be achieved by using beam forming meanst,i(n) effective suppression. The beamforming may employ a fixed beamforming method or an adaptive beamforming method. The fixed Beamforming method includes Delay-and-Sum Beamforming (DSB), Delay-and-Filtering (DFB), and Robust super-directional Beamforming (RSB). The adaptive beamforming algorithm may include a Generalized Sidelobe suppression method (GSC), a Minimum Variance Distortionless Response Method (MVDR), a Multi-channel Wiener Filtering Method (MWF), and the like.
In the post-processing step, the signal after the beam forming can be filtered by adopting a filtering method to eliminate the environmental noise. The multi-channel post-filtering method can adopt a coherence-based method, an energy-based method or a combination of the two methods, and the like, and can also adopt a spectral subtraction method, a subspace method and the like for the purpose of steady-state noise suppression. In one example, the residual directional noise and stationary noise are preprocessed using spectral subtraction in consideration of stability and computation amount.
The output signal of the signals received by the multiple microphones after beam forming and post filtering is x (n), and the stable state noise residue is not considered:
x(n)=s(n)+dw(n) (2)
where s (n) is the estimated target speech component, dw(n) is the residual wind noise. (2) The frequency domain of formula is represented as:
X(k,l)=S(k,l)+Dw(k,l) (3)
wherein X (k, l), S (k, l) and Dw(k, l) are x (n), s (n) and d, respectivelywThe kth frequency band short-time spectrum of the (n) th frame can be realized by Fast Fourier Transform (FFT). Similarly, the signal x is received for the ith microphonei(n) performing FFT with the corresponding k-th band short-time spectrum of the l-th frame as Xi(k,l)。
After beamforming and post-processing, the signal is wind noise detected. Fig. 5 is a schematic block diagram of full-band wind noise detection based on the digital-to-analog square sum phase of complex interference function.
First, a double smoothing factor complex coherence function estimation is performed. The strong non-stationary characteristic and the strong energy characteristic of wind noise are considered, a method for estimating the complex coherent function by using double smoothing factors is provided, and the influence of the wind noise on weak voice and audio is avoided.
The traditional complex coherence function estimation method adopts a fixed smoothing factor, namely:
Figure BDA0001767851360000061
wherein, Cij(k, l) is complex coherence function value of kth frequency band short-time spectrum of ith frame of ith microphone and jth microphone, which is small quantity greater than 0, avoiding zero-division operation; l, k are natural numbers, i is 1,2,. M, j is 1,2,. M.
Figure BDA0001767851360000062
Self-power spectrum and cross-power spectrum: when i ═ j, Rij(k,l)=Rii(k,l)=Rjj(k, l) is self powerWhen i ≠ j, it is cross-power spectrum α is a fixed smoothing factor, generally ranging from 0 to 1. to reduce the estimation bias, α should be close to 1, but this will cause distortion of the speech and audio initial segments, especially when the wind noise energy is high, it will cause distortion of the speech and audio signals for a long time.
In one embodiment, a fixed smoothing factor problem is solved using a double smoothing factor strategy, where a large smoothing factor α1∈[0.7 0.9]For estimating long-term complex coherence function estimation, small smoothing factor α2∈[0.4 0.6]For estimating the instantaneous complex coherence function. The long-term complex phase dry function and the instantaneous complex phase dry function are respectively
Figure BDA0001767851360000063
And
Figure BDA0001767851360000064
all adopt the formula (4) calculation except that
Figure BDA0001767851360000065
With a large smoothing factor α1And a small smoothing factor α2The substitution is made. Of course, it is also possible to use three or more smoothing factors to estimate the complex coherence function multiple times for the purpose of protecting the speech and audio signals, which is a simple extension and expansion of the dual-smoothing-factor strategy and should also fall within the scope of the present patent.
Further, in one embodiment, an adaptive leveling slip factor α may be employed. For example, the smoothing factor α is adaptively adjusted according to the stationarity of the signal picked up by the microphone, the absolute value of the complex coherence function, or the signal wind-to-noise ratio, so as to achieve the purpose of protecting the voice and audio signals, which is also protected by the present patent.
And then, carrying out full-band wind noise detection based on the digital-to-analog square sum phase of the complex phase interference function.
The wind noise characteristics are different from the directional target voice and audio sound source, and are embodied as follows: first, the complex coherence function digital-to-analog square of the wind noise between different microphones is close to 0, while the complex coherence function digital-to-analog square of the directional target speech audio is close to 1. Fig. 3 presents a graph of the complex coherence function digital-to-analog squared of wind noise and a directional target speech sound source, which verifies the difference in wind noise characteristics from the directional target speech and audio sound source. Secondly, wind noise does not belong to a directional sound source, and adjacent frames of sound source positioning angles are irregular; the directional sound source can find the direction of the point sound source by a sound source positioning method, and the inter-frame sound source positioning angle has continuity; and the environmental noise close to the diffusion field has the zero phase characteristic and also shows the good inter-frame continuity characteristic of the sound source positioning angle. FIG. 4 shows the results of the wind noise source localization and the directional target voice sound source localization angle over time, with the first 5 seconds target voice sound source localization at 0 degrees and the source localization angle jitter only in the silent segment; and in the data of the 6 th to 10 th seconds in which wind noise exists, the sound source localization angle appears to be greatly jumped. Wind noise can be detected according to the continuity of the sound source azimuth, and sound source angle variance calculation can be carried out by combining multiple frames; when the variance exceeds a certain threshold, the current frame can be preliminarily judged as wind noise.
And further combining the digital-to-analog square characteristic of the complex phase interference function, detecting wind noise by a double-threshold judgment mode, and the schematic block diagram is shown in figure 5. The method comprises the following specific steps:
1) performing pairwise microphone sound source positioning based on a complex coherence function by adopting a GCC-PHAT (Generalized Cross-correlation Phase Transform) method or a SCOT-PHAT (Standard Cross-correlation Phase Transform) method and the like, and assuming that the sound source positioning angle of the ith microphone and the jth microphone in the first frame is thetaij(l) (ii) a All or part of the frequency band of the complex coherence function may be used for sound source localization.
2) Calculating the sound source localization angle variance at the l-th frame, i.e., M (M-1)/2 groups θij(l) Variance:
Figure BDA0001767851360000071
wherein,
Figure BDA0001767851360000072
3) calculating the mean value of the square of each frame complex phase interference function digital model:
Figure BDA0001767851360000073
wherein k islowAnd kupLower and upper limits of the frequency band, k, respectivelylowMinimum value of 0, kupThe maximum value is 1/2 of the FFT frame length;
4) performing wind noise detection according to the complex phase interference function digital-analog square mean value and the sound source positioning angle variance, wherein the detection basis is as follows:
Figure BDA0001767851360000081
wherein,
Figure BDA0001767851360000082
and
Figure BDA0001767851360000083
for threshold, q (l) 1 indicates that the l-th frame is wind noise.
The complex interference function digital-to-analog square mean value can assist wind noise detection based on the sound source positioning angle variance, and is beneficial to reducing voice distortion possibly brought by the complex interference function digital-to-analog square mean value.
Fig. 6 presents an alternative to fig. 5. The alternative is based on the detection of the complex phase interference function digital-to-analog squared sum phase with the wind noise in the molecule.
When the number of microphones is only two, that is, M is 2, the variance cannot be calculated by using equation (5), and equation (7) can be determined only by the complex phase interference function digital-analog square mean value. In order to solve the problem and reduce the voice distortion of a frequency band which is not influenced by wind noise, a molecular wind noise detection strategy is further provided by utilizing the characteristic that the positioning angle of a wind noise source between adjacent sub-bands is irregular, and the specific steps are as follows:
1) combining multiple frequency bands into a sub-band, e.g. every 500Hz, and performing sound source localization on each sub-band, wherein the sound source localization angle of each sub-band is thetaij(kappa, l), wherein kappa denotes a subunitA pointer is arranged;
2) calculating the M (M-1)/2 group thetaij(kappa, l) variance
Figure BDA0001767851360000084
Wherein,
Figure BDA0001767851360000085
when M is 2, there is only one set of θ12(κ, l), where the variance is calculated as follows:
Figure BDA0001767851360000086
wherein,
Figure BDA0001767851360000087
κupis the subband upper bound. When M is 2, the wind noise detection sound source angle variance of equation (7) can be calculated by equations (8) and (9);
3) calculating the mean value C of the sub-band complex phase interference function digital-to-analog squarea(kappa, l) and Cb(κ, l) calculated similarly to equation (6) with only k being consideredlowAnd kupI.e. k for each subbandlowAnd kupCarrying out corresponding modification;
4) performing molecular band wind noise detection according to the sub-band complex phase dry function digital-to-analog square mean and the sub-band sound source angle variance, wherein the detection basis is as follows:
Figure BDA0001767851360000091
wherein,
Figure BDA0001767851360000092
and
Figure BDA0001767851360000093
for the threshold, q (k, l) ═ 1 indicates that the kth sub-band of the l-th frame is wind noise.
The sub-band combined wind noise detection scheme realizes strong suppression of wind noise and avoids distortion of wind noise-free frequency band voice and audio.
Under the condition that wind noise is detected, the harmonic noise ratio and the high-low frequency energy ratio can be combined to protect the voice, and therefore wind noise suppression is achieved.
Fig. 7 is a schematic diagram of a gain function protection strategy based on speech harmonic characteristics and high-low frequency energy ratio. In the upper part of fig. 7, the process of protecting voiced sounds with harmonic noise ratio is illustrated.
Voiced sounds have harmonic characteristics, and in order to extract the fundamental frequency of harmonics, a cepstrum-based method and a frequency domain-based method may be employed. The method comprises the following specific steps.
1) Firstly, calculating a power cepstrum coefficient c (lambda, l), wherein lambda is a cepstrum coefficient pointer;
2) judging whether the current frame is voiced according to c (lambda, l) according to the following judgment:
Figure BDA0001767851360000094
wherein, cthIs a threshold, λminAnd λmaxMinimum and maximum cepstrum frequency coefficient pointers corresponding to voiced fundamental frequencies;
3) b is a lambda of c (lambda, l)minTo lambdamaxThe middle maximum coefficient is set to 0 and the harmonic noise ratio is estimated to be
HNR (k, l), the value of the first speech protection gain function in the kth band of the l frame is:
Figure BDA0001767851360000095
considering the long-term spectral characteristics, wind noise is generally stronger in low-frequency energy, and unvoiced high-frequency energy is generally stronger. In the lower part of fig. 7, the process of protecting unvoiced sound according to the energy ratio of high and low frequencies is illustrated. The method comprises the following specific steps.
1) Calculating high frequency energy
Figure BDA0001767851360000096
And low frequency energy
Figure BDA0001767851360000097
Wherein Kmin,KmidAnd KhighThe minimum band, the middle band and the highest band for calculating the low and high frequency energy, respectively;
2) calculating a high-low frequency energy ratio PR (l) ═ P of each framehigh(l)/Plow(l);
3) For unvoiced speech protection according to pr (l), the value of the second speech protection gain function in the kth band of the l frame is:
Figure BDA0001767851360000101
wherein, PRthIs a threshold. Gp,2(k, l) takes a value of 1 only in the mid-high frequency band where unvoiced sounds may occur.
When full-band wind noise detection is employed, the wind noise gain function is G1(k, l) ═ 1-q (l), i.e., 1-q (l) values are assigned to the gain function for all bands in the l-th frame. When the molecular band wind noise detection is adopted, the wind noise gain function is G1(k, l) ═ 1-q (k, l), i.e., 1-q (k, l) values are assigned to the gain function of the kth subband in the l-th frame.
According to the two voice protection strategies, a final wind noise suppression gain function is formed as follows:
Gw(k,l)=max{G1(k,l),Gp,1(k,l),Gp,2(k,l)} (14)
the finally output enhanced voice time domain signal is obtained through inverse FFT and overlap addition, namely:
Figure BDA0001767851360000102
fig. 8 shows the results of performing wind noise detection and suppression. Fig. 8 is a diagram illustrating the processing effect of detecting and suppressing wind noise according to an embodiment of the present invention. Wherein, (a) is a time domain plot of speech contaminated by wind noise; (b) is a speech spectrogram polluted by wind noise; (c) a voice time domain graph after wind noise suppression; (d) and a voice spectrogram for wind noise suppression. The first 5 seconds are speech and the last 5 seconds are wind noise. As can be seen from fig. 8, the speech signal is preserved and the wind noise is effectively suppressed.
It will be understood by those skilled in the art that all or part of the steps in the method for implementing the above embodiments may be implemented by a program, and the program may be stored in a computer-readable storage medium, where the storage medium is a non-transitory medium, such as a random access memory, a read only memory, a flash memory, a hard disk, a solid state disk, a magnetic tape (magnetic tape), a floppy disk (floppy disk), an optical disk (optical disk), and any combination thereof.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.
Cited documents:
[1]Loizou P C.Speech enhancement:theory and practice[M].CRC press,2013.
[2]Ward D,Brandstein M.Microphone arrays:signal processing techniquesand applications[J].2001.
Schmidt M N,Larsen J,Hsiao F T.“Wind noise reduction using non-negative sparse coding”,in IEEE Workshop on Machine Learning for SignalProcessing,Aug.27-29,2007.

Claims (16)

1. a method of wind noise detection, the method comprising:
receiving speech and/or audio signals from M microphones, where M is an integer greater than 1;
performing complex phase interference function estimation on voice and/or audio signals to obtain complex phase interference function values of kth frequency band short-time spectrums of the ith microphone and the jth microphone, wherein l and k are natural numbers, i is 1,2,. M, j is 1,2,. M;
obtaining the sound source positioning angles of the ith microphone and the jth microphone in the ith frame according to the complex coherence function value;
determining the variance of the sound source positioning angle in the ith frame according to the sound source positioning angle;
and determining whether wind noise exists in the ith frame according to the sound source positioning angle variance.
2. The method of claim 1, wherein the complex coherence function values include a first complex coherence function value obtained by performing a long term complex coherence function estimation using a first smoothing factor and a second complex coherence function value obtained by performing an instantaneous complex coherence function estimation using a second smoothing factor, the first smoothing factor being greater than the second smoothing factor; the method comprises the steps of determining a complex interference function digital-to-analog square mean value of the ith frame according to at least one of the first complex interference function value and the second complex interference function value; the determining whether wind noise exists in the ith frame according to the sound source positioning angle variance comprises determining whether wind noise exists in the ith frame according to the sound source positioning angle variance by means of a complex interference function digital-to-analog square mean value of the ith frame.
3. The method of claim 2 wherein the first smoothing factor α1∈[0.7 0.9]Second smoothing factor α2∈[0.4 0.6]。
4. The method of claim 1, wherein the method further comprises the step of applying a voltage to the substrate
Obtaining sound source positioning angles of the ith microphone and the jth microphone in the ith frame according to the complex coherence function value, wherein the sound source positioning angles of the ith microphone and the jth microphone in the kth sub-band of the ith frame are determined; the kth sub-band is formed by combining at least one frequency band;
determining the sound source positioning angle variance in the l frame according to the sound source positioning angle, wherein the step of determining the sound source positioning angle variance in the k sub-band of the l frame according to the sound source positioning angles of the ith microphone and the jth microphone in the k sub-band of the l frame comprises the step of determining the sound source positioning angle variance in the k sub-band of the l frame; determining whether wind noise exists in the ith frame according to the sound source positioning angle variance comprises determining whether wind noise exists in the kth sub-band of the ith frame according to the sound source positioning angle variance of the kth sub-band of the ith frame.
5. The method of claim 1, wherein M-2;
obtaining sound source positioning angles of the ith microphone and the jth microphone in the ith frame according to the complex coherence function value, wherein the sound source positioning angles of the 2 microphones in the kth sub-band of the ith frame are determined; the kth sub-band is formed by combining at least one frequency band;
determining the sound source localization angle variance at the l frame according to the sound source localization angle comprises determining the sound source localization angle variance at the l frame according to the sound source localization angle of the 2 microphones at the k subband of the l frame.
6. A method of wind noise suppression, the method comprising:
receiving speech and/or audio signals from M microphones, where M is an integer greater than 1;
performing complex phase interference function estimation on voice and/or audio signals to obtain complex phase interference function values of kth frequency band short-time spectrums of the ith microphone and the jth microphone, wherein l and k are natural numbers, i is 1,2,. M, j is 1,2,. M;
obtaining the sound source positioning angles of the ith microphone and the jth microphone in the ith frame according to the complex coherence function value;
determining the variance of the sound source positioning angle in the ith frame according to the sound source positioning angle;
determining a wind noise gain function of the l frame according to the sound source positioning angle variance of the l frame;
and suppressing wind noise existing in the voice and/or audio signals according to the wind noise gain function of the ith frame.
7. The method of claim 6, wherein the complex coherence function values include a first complex coherence function value obtained by performing a long term complex coherence function estimation using a first smoothing factor and a second complex coherence function value obtained by performing an instantaneous complex coherence function estimation using a second smoothing factor, the first smoothing factor being greater than the second smoothing factor; the method comprises the steps of determining a complex interference function digital-to-analog square mean value of the ith frame according to at least one of the first complex interference function value and the second complex interference function value; the determining whether wind noise exists in the ith frame according to the sound source positioning angle variance comprises determining whether wind noise exists in the ith frame according to the sound source positioning angle variance by means of a complex interference function digital-to-analog square mean value of the ith frame.
8. The method of claim 7 wherein the first smoothing factor α1∈[0.7 0.9]Second smoothing factor α2∈[0.4 0.6]。
9. The method of claim 6, wherein the method further comprises the step of determining a target value of the target value
Obtaining sound source positioning angles of the ith microphone and the jth microphone in the ith frame according to the complex coherence function value, wherein the sound source positioning angles of the ith microphone and the jth microphone in the kth sub-band of the ith frame comprise sound source positioning angles; the kth sub-band is formed by combining at least one frequency band;
determining the sound source positioning angle variance in the l frame according to the sound source positioning angle, wherein the step of determining the sound source positioning angle variance in the k sub-band of the l frame according to the sound source positioning angles of the ith microphone and the jth microphone in the k sub-band of the l frame comprises the step of determining the sound source positioning angle variance in the k sub-band of the l frame; determining whether wind noise exists in the ith frame according to the sound source positioning angle variance comprises determining whether wind noise exists in the kth sub-band of the ith frame according to the sound source positioning angle variance of the kth sub-band of the ith frame.
10. The method of claim 6, wherein M-2;
obtaining sound source positioning angles of the ith microphone and the jth microphone in the ith frame according to the complex coherence function value, wherein the sound source positioning angles of the 2 microphones in the kth sub-band of the ith frame are determined; the kth sub-band is formed by combining at least one frequency band;
determining the sound source localization angle variance at the l frame according to the sound source localization angle comprises determining the sound source localization angle variance at the l frame according to the sound source localization angle of the 2 microphones at the k subband of the l frame.
11. The method of claim 6, wherein the method includes estimating a harmonic-to-noise ratio at the I frame, and determining a first speech protection gain function at the I frame based on the harmonic-to-noise ratio at the I frame; the suppressing the wind noise present in the speech and/or audio signal according to the wind noise gain function of the ith frame may include suppressing the wind noise present in the speech and/or audio signal in conjunction with the first speech protection gain function of the ith frame.
12. The method of claim 6, wherein the method includes estimating a ratio of high and low frequency energy at an l-th frame, and determining a second speech protection gain function at the l-th frame based on the ratio of high and low frequency energy at the l-th frame; the suppressing the wind noise present in the speech and/or audio signal according to the wind noise gain function of the ith frame may include suppressing the wind noise present in the speech and/or audio signal in conjunction with a second speech protection gain function of the ith frame.
13. A wind noise detection apparatus, the apparatus comprising:
a receiving module configured to receive speech and/or audio signals from M microphones, wherein M is an integer greater than 1;
an estimation module, configured to perform complex coherence function estimation on a speech and/or audio signal to obtain complex coherence function values of kth frequency band short-time spectrums of an ith frame of an ith microphone and a jth microphone, where l and k are natural numbers, i is 1,2,. M, j is 1,2,. M;
the angle calculation module is configured to obtain the sound source positioning angles of the ith microphone and the jth microphone in the l frame according to the complex coherence function value;
an angle variance determining module configured to determine a sound source localization angle variance at the l-th frame according to the sound source localization angle;
and the wind noise determining module is configured for determining whether wind noise exists in the ith frame according to the sound source positioning angle variance.
14. A wind noise suppression apparatus, characterized in that the apparatus comprises:
a receiving module receiving speech and/or audio signals from M microphones, wherein M is an integer greater than 1;
an estimation module, configured to perform complex coherence function estimation on speech and/or audio to obtain complex coherence function values of a kth frequency band short-time spectrum of an ith frame of an ith microphone and a jth microphone, where l and k are natural numbers, i is 1,2,. M, j is 1,2,. M;
the angle calculation module is configured to obtain the sound source positioning angles of the ith microphone and the jth microphone in the l frame according to the complex coherence function value;
an angle variance determining module configured to determine a sound source localization angle variance at the l-th frame according to the sound source localization angle;
the wind noise gain determining module is configured to determine a wind noise gain function of the l frame according to the sound source positioning angle variance;
and the suppression module is configured to suppress wind noise existing in the voice and/or the audio signal according to the wind noise gain function of the ith frame.
15. A computer-readable storage medium comprising instructions that, when executed on a computer, cause the computer to perform the method of any of claims 1 to 12.
16. A computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of any one of claims 1 to 12.
CN201810935974.7A 2018-08-16 2018-08-16 Wind noise detection and suppression method and device suitable for voice and audio Active CN109215677B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810935974.7A CN109215677B (en) 2018-08-16 2018-08-16 Wind noise detection and suppression method and device suitable for voice and audio

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810935974.7A CN109215677B (en) 2018-08-16 2018-08-16 Wind noise detection and suppression method and device suitable for voice and audio

Publications (2)

Publication Number Publication Date
CN109215677A CN109215677A (en) 2019-01-15
CN109215677B true CN109215677B (en) 2020-09-29

Family

ID=64989091

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810935974.7A Active CN109215677B (en) 2018-08-16 2018-08-16 Wind noise detection and suppression method and device suitable for voice and audio

Country Status (1)

Country Link
CN (1) CN109215677B (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB201818959D0 (en) 2018-11-21 2019-01-09 Nokia Technologies Oy Ambience audio representation and associated rendering
GB201902812D0 (en) 2019-03-01 2019-04-17 Nokia Technologies Oy Wind noise reduction in parametric audio
CN111223493B (en) * 2020-01-08 2022-08-02 北京声加科技有限公司 Voice signal noise reduction processing method, microphone and electronic equipment
CN111833890B (en) * 2020-07-13 2023-07-25 北京声加科技有限公司 Device and method for automatically detecting wearing state of helmet
CN112309420B (en) * 2020-10-30 2023-06-27 出门问问(苏州)信息科技有限公司 Method and device for detecting wind noise
CN112309418B (en) * 2020-10-30 2023-06-27 出门问问(苏州)信息科技有限公司 Method and device for inhibiting wind noise
CN112242148B (en) * 2020-11-12 2023-06-16 北京声加科技有限公司 Headset-based wind noise suppression method and device
CN112802486B (en) * 2020-12-29 2023-02-14 紫光展锐(重庆)科技有限公司 Noise suppression method and device and electronic equipment
CN112884975A (en) * 2021-01-22 2021-06-01 李习平 Scenic spot commodity selling system based on solar street lamp
CN113380266B (en) * 2021-05-28 2022-06-28 中国电子科技集团公司第三研究所 Miniature dual-microphone speech enhancement method and miniature dual-microphone
US11670326B1 (en) * 2021-06-29 2023-06-06 Amazon Technologies, Inc. Noise detection and suppression
CN113707170A (en) * 2021-08-30 2021-11-26 展讯通信(上海)有限公司 Wind noise suppression method, electronic device, and storage medium
CN115691556B (en) * 2023-01-03 2023-03-14 北京睿科伦智能科技有限公司 Method for detecting multichannel voice quality of equipment side

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1530928A (en) * 2003-02-21 2004-09-22 哈曼贝克自动系统-威美科公司 System for inhibitting wind noise
WO2013187946A2 (en) * 2012-06-10 2013-12-19 Nuance Communications, Inc. Wind noise detection for in-car communication systems with multiple acoustic zones
CN104157295A (en) * 2014-08-22 2014-11-19 中国科学院上海高等研究院 Method used for detecting and suppressing transient noise
CN105792071A (en) * 2011-02-10 2016-07-20 杜比实验室特许公司 System and method for wind detection and suppression
CN106161751A (en) * 2015-04-14 2016-11-23 电信科学技术研究院 A kind of noise suppressing method and device
CN106448693A (en) * 2016-09-05 2017-02-22 华为技术有限公司 Speech signal processing method and apparatus

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8433564B2 (en) * 2009-07-02 2013-04-30 Alon Konchitsky Method for wind noise reduction

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1530928A (en) * 2003-02-21 2004-09-22 哈曼贝克自动系统-威美科公司 System for inhibitting wind noise
CN105792071A (en) * 2011-02-10 2016-07-20 杜比实验室特许公司 System and method for wind detection and suppression
WO2013187946A2 (en) * 2012-06-10 2013-12-19 Nuance Communications, Inc. Wind noise detection for in-car communication systems with multiple acoustic zones
CN104157295A (en) * 2014-08-22 2014-11-19 中国科学院上海高等研究院 Method used for detecting and suppressing transient noise
CN106161751A (en) * 2015-04-14 2016-11-23 电信科学技术研究院 A kind of noise suppressing method and device
CN106448693A (en) * 2016-09-05 2017-02-22 华为技术有限公司 Speech signal processing method and apparatus

Also Published As

Publication number Publication date
CN109215677A (en) 2019-01-15

Similar Documents

Publication Publication Date Title
CN109215677B (en) Wind noise detection and suppression method and device suitable for voice and audio
CN111418010B (en) Multi-microphone noise reduction method and device and terminal equipment
US8898058B2 (en) Systems, methods, and apparatus for voice activity detection
CN104157295B (en) For detection and the method for transient suppression noise
CN102938254B (en) Voice signal enhancement system and method
JP5007442B2 (en) System and method using level differences between microphones for speech improvement
CN103718241A (en) Noise suppression device
Fingscheidt et al. Environment-optimized speech enhancement
CN110085246A (en) Sound enhancement method, device, equipment and storage medium
CN104835503A (en) Improved GSC self-adaptive speech enhancement method
US20180308503A1 (en) Real-time single-channel speech enhancement in noisy and time-varying environments
US11594239B1 (en) Detection and removal of wind noise
CN106653004B (en) Speaker identification feature extraction method for sensing speech spectrum regularization cochlear filter coefficient
Yousefian et al. Using power level difference for near field dual-microphone speech enhancement
Martín-Doñas et al. Dual-channel DNN-based speech enhancement for smartphones
Wu et al. A study on target feature activation and normalization and their impacts on the performance of DNN based speech dereverberation systems
Xiong et al. Spectro-Temporal SubNet for Real-Time Monaural Speech Denoising and Dereverberation.
Kim Signal processing for robust speech recognition motivated by auditory processing
CN106997768A (en) A kind of computational methods, device and the electronic equipment of voice probability of occurrence
CN116106826A (en) Sound source positioning method, related device and medium
WO2019205797A1 (en) Noise processing method, apparatus and device
Prasad et al. Two microphone technique to improve the speech intelligibility under noisy environment
Dionelis On single-channel speech enhancement and on non-linear modulation-domain Kalman filtering
Wang et al. Speech enhancement based on perceptually motivated guided spectrogram filtering
Zhang et al. A robust speech enhancement method based on microphone array

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant