CN113539291B

CN113539291B - Noise reduction method and device for audio signal, electronic equipment and storage medium

Info

Publication number: CN113539291B
Application number: CN202110777962.8A
Authority: CN
Inventors: 李良斌; 陈孝良
Original assignee: Beijing SoundAI Technology Co Ltd
Current assignee: Beijing SoundAI Technology Co Ltd
Priority date: 2021-07-09
Filing date: 2021-07-09
Publication date: 2024-06-25
Anticipated expiration: 2041-07-09
Also published as: CN113539291A

Abstract

The application provides a noise reduction method and device for an audio signal, electronic equipment and a storage medium, and belongs to the technical field of audio processing. The method comprises the following steps: noise reduction processing is carried out on the first audio signal based on the first audio signal and the second audio signal to obtain a third audio signal, wherein the first audio signal and the second audio signal are audio signals with coherence; obtaining a target noise reference signal based on the second audio signal and one of the first audio signal and the third audio signal; and carrying out noise reduction processing on the third audio signal based on the target noise reference signal to obtain a target voice signal. The method improves the purity of the target voice signal after noise reduction.

Description

Noise reduction method and device for audio signal, electronic equipment and storage medium

Technical Field

The present application relates to the field of audio processing technologies, and in particular, to a method and apparatus for noise reduction of an audio signal, an electronic device, and a storage medium.

Background

Dual microphone noise reduction techniques are commonly employed by electronic devices to reduce noise in recorded audio signals. The electronic equipment adopting the dual-microphone noise reduction technology is generally provided with a main microphone and a secondary microphone, and audio signals recorded by the main microphone and the secondary microphone at the same time have coherence; in addition, the audio signals recorded by the primary microphone have more voice signals and fewer noise signals, and the audio signals recorded by the secondary microphone have more noise signals and fewer voice signals. Thus, the difference between the audio signals of the two microphones may be utilized to reduce the noise of the audio signal recorded by the primary microphone.

In the related art, a voice signal recorded by a secondary microphone is determined based on an audio signal recorded by a primary microphone, the voice signal is removed from the audio signal recorded by the secondary microphone, a noise signal in the audio signal recorded by the secondary microphone is obtained, and the noise signal is removed from the audio signal recorded by the primary microphone, so that a noise-reduced voice signal is obtained.

In the related art, because noise signals which are not recorded by the auxiliary microphone exist in the audio signals recorded by the main microphone, the noise signals in the audio signals recorded by the main microphone cannot be removed based on the method, and further the noise signals exist in the obtained noise-reduced voice signals, so that the purity of the noise-reduced voice signals is lower.

Disclosure of Invention

The embodiment of the application provides a noise reduction method, a device, electronic equipment and a storage medium for audio signals, wherein a target noise reference signal determined based on one of a first audio signal and a third audio signal and a second audio signal not only comprises a noise signal in the second audio signal, but also comprises a noise signal in a first audio signal which is not recorded by the second audio signal, so that noise reduction processing is carried out on the third audio signal based on the target noise reference signal, the noise signal in the third audio signal can be effectively removed, and the purity of a noise-reduced voice signal can be improved.

In one aspect, there is provided a method of noise reduction of an audio signal, the method comprising: noise reduction processing is carried out on the first audio signal based on the first audio signal and the second audio signal to obtain a third audio signal, wherein the first audio signal and the second audio signal are audio signals with coherence; obtaining a target noise reference signal based on the second audio signal and one of the first audio signal and the third audio signal; and carrying out noise reduction processing on the third audio signal based on the target noise reference signal to obtain a target voice signal.

In some embodiments, the first audio signal and the second audio signal are respectively audio signals with coherence that are recorded by two audio acquisition components.

In some embodiments, the noise signal of the first audio signal is less than the noise signal of the second audio signal.

In one possible implementation manner, the obtaining a target noise reference signal based on the second audio signal and one of the first audio signal and the third audio signal includes: and fusing one of the first audio signal and the third audio signal with the second audio signal to obtain the target noise reference signal.

In one possible implementation manner, the fusing one of the first audio signal and the third audio signal with the second audio signal to obtain the target noise reference signal includes: determining a target fusion coefficient based on the first audio signal, the second audio signal, and the third audio signal; and fusing one of the first audio signal and the third audio signal with the second audio signal based on the target fusion coefficient to obtain the target noise reference signal.

In one possible implementation, the determining the target fusion coefficient based on the first audio signal, the second audio signal, and the third audio signal includes: respectively determining a first spectrogram of the first audio signal and a second spectrogram of the second audio signal, wherein the abscissa of the first spectrogram and the second spectrogram is frequency, and the ordinate of the first spectrogram and the second spectrogram is decibel; determining a coherent frequency band in the first spectrogram, wherein the coherent frequency band is a frequency band in which the decibel difference between the coherent frequency band and the second spectrogram in the first spectrogram is smaller than a preset decibel value; the target fusion coefficient is determined based on the coherent frequency band, the first audio signal, and the third audio signal.

In one possible implementation manner, the determining the target fusion coefficient based on the coherent frequency band, the first audio signal and the third audio signal includes: determining that the minimum frequency of the coherent frequency band is not lower than a preset frequency; the first energy is the energy sum of the first audio signal in the coherent frequency band, and the second energy is the energy sum of the third audio signal in the coherent frequency band; determining a quotient of the second energy and the first energy as the target fusion coefficient.

In one possible implementation manner, the determining the target fusion coefficient based on the coherent frequency band, the first audio signal and the third audio signal includes: determining that the minimum frequency of the coherent frequency band is lower than a preset frequency; respectively determining first energy, second energy and third energy, wherein the first energy is the energy sum of the first audio signal in the coherent frequency band, the second energy is the energy sum of the third audio signal in the coherent frequency band, the third energy is the energy sum of the first audio signal in a full frequency band, and the full frequency band is a frequency band included by the first spectrogram; determining a quotient of the second energy and the first energy as a first fusion coefficient; determining a quotient of the first energy and the third energy as a second fusion coefficient; and determining the average value of the first fusion coefficient and the second fusion coefficient as the target fusion coefficient.

In one possible implementation manner, the fusing, based on the target fusion coefficient, one of the first audio signal and the third audio signal and the second audio signal to obtain the target noise reference signal includes: generating first relation data based on the target fusion coefficient, wherein the first relation data is relation data with parameters of the target fusion coefficient, independent variables of one audio signal of the first audio signal and the third audio signal, the second audio signal and the dependent variables of the target noise reference signal; and fusing one of the first audio signal and the third audio signal with the second audio signal through the first relation data to obtain the target noise reference signal.

In one possible implementation manner, the performing noise reduction processing on the third audio signal based on the target noise reference signal to obtain a target voice signal includes: determining a first speech signal in the third audio signal; removing the first voice signal in the target noise reference signal to obtain a first noise signal; and removing the first noise signal in the third audio signal to obtain the target voice signal.

In one possible implementation, the determining the first speech signal in the third audio signal includes: determining a fourth audio signal from the third audio signal, wherein the fourth audio signal is a signal containing a noise signal in the third audio signal; and taking the voice signal in the fourth audio signal as the first voice signal.

In one possible implementation manner, the noise reduction processing is performed on the first audio signal based on the first audio signal and the second audio signal to obtain a third audio signal, including: determining a second speech signal in the first audio signal; removing the second voice signal in the second audio signal to obtain a second noise signal; and removing the second noise signal in the first audio signal to obtain the third audio signal.

In one possible implementation, the method further includes: transmitting the target voice signal to enable the communication opposite terminal to execute a command corresponding to the target voice signal; or playing the target voice signal; or generating an audio file based on the target speech signal.

In another aspect, there is provided a noise reduction apparatus for an audio signal, the apparatus comprising:

The first processing module is used for carrying out noise reduction processing on the first audio signal based on the first audio signal and the second audio signal to obtain a third audio signal, wherein the first audio signal and the second audio signal are audio signals with coherence; a determining module for obtaining a target noise reference signal based on the second audio signal and one of the first audio signal and the third audio signal; and the second processing module is used for carrying out noise reduction processing on the third audio signal based on the target noise reference signal to obtain a target voice signal.

In some possible embodiments, the first audio signal and the second audio signal are respectively audio signals with coherence recorded by the two audio acquisition components.

In some possible embodiments, the noise signal of the first audio signal is less than the noise signal of the second audio signal.

In one possible implementation, the determining module includes: and the fusion unit is used for fusing one of the first audio signal and the third audio signal with the second audio signal to obtain the target noise reference signal.

In one possible implementation, the fusion unit includes: a determining subunit configured to determine a target fusion coefficient based on the first audio signal, the second audio signal, and the third audio signal; and the fusion subunit is used for fusing one of the first audio signal and the third audio signal with the second audio signal based on the target fusion coefficient to obtain the target noise reference signal.

In a possible implementation manner, the determining subunit is configured to: respectively determining a first spectrogram of the first audio signal and a second spectrogram of the second audio signal, wherein the abscissa of the first spectrogram and the second spectrogram is frequency, and the ordinate of the first spectrogram and the second spectrogram is decibel; determining a coherent frequency band in the first spectrogram, wherein the coherent frequency band is a frequency band in which the decibel difference between the coherent frequency band and the second spectrogram in the first spectrogram is smaller than a preset decibel value; the target fusion coefficient is determined based on the coherent frequency band, the first audio signal, and the third audio signal.

In a possible implementation manner, the determining subunit is configured to: determining that the minimum frequency of the coherent frequency band is not lower than a preset frequency; respectively determining first energy and second energy, wherein the first energy is the energy sum of the first audio signal in the coherent frequency band, and the second energy is the energy sum of the third audio signal in the coherent frequency band; determining a quotient of the second energy and the first energy as the target fusion coefficient.

In a possible implementation manner, the determining subunit is configured to: determining that the minimum frequency of the coherent frequency band is lower than a preset frequency; respectively determining first energy, second energy and third energy, wherein the first energy is the energy sum of the first audio signal in the coherent frequency band, the second energy is the energy sum of the third audio signal in the coherent frequency band, the third energy is the energy sum of the first audio signal in a full frequency band, and the full frequency band is a frequency band included by the first spectrogram; determining a quotient of the second energy and the first energy as a first fusion coefficient; determining a quotient of the first energy and the third energy as a second fusion coefficient; and determining the average value of the first fusion coefficient and the second fusion coefficient as the target fusion coefficient.

In one possible implementation, the fusion subunit is configured to: generating first relation data based on the target fusion coefficient, wherein the first relation data is relation data with parameters of the target fusion coefficient, independent variables of one audio signal of the first audio signal and the third audio signal, the second audio signal and the dependent variables of the target noise reference signal; and fusing one of the first audio signal and the third audio signal with the second audio signal through the first relation data to obtain the target noise reference signal.

In one possible implementation manner, the second processing module includes: a determining unit configured to determine a first speech signal in the third audio signal; a first removing unit, configured to remove the first speech signal in the target noise reference signal, to obtain a first noise signal; and the second removing unit is used for removing the first noise signal in the third audio signal to obtain the target voice signal.

In a possible implementation manner, the determining unit is configured to: determining a fourth audio signal from the third audio signal, wherein the fourth audio signal is a signal containing a noise signal in the third audio signal; and taking the voice signal in the fourth audio signal as the first voice signal.

In one possible implementation manner, the first processing module is configured to: determining a second speech signal in the first audio signal; removing the second voice signal in the second audio signal to obtain a second noise signal; and removing the second noise signal in the first audio signal to obtain the third audio signal.

In one possible implementation, the apparatus further includes: the sending module is used for sending the target voice signal to enable the communication opposite terminal to execute a command corresponding to the target voice signal; the playing module is used for playing the target voice signal; and the generating module is used for generating an audio file based on the target voice signal.

In another aspect, an electronic device is provided that includes one or more processors and one or more memories having stored therein at least one program code loaded and executed by the one or more processors to perform operations as performed by the method of noise reduction of an audio signal as described above.

In another aspect, a computer readable storage medium having stored therein at least one program code loaded and executed by a processor to perform operations performed by a method for noise reduction of an audio signal as described above is provided.

In another aspect, a computer program product or a computer program is provided, the computer program product or the computer program comprising computer program code, the computer program code being stored in a computer readable storage medium. The processor of the electronic device reads the computer program code from the computer readable storage medium and executes the computer program code such that the electronic device performs the operations performed by the above-described noise reduction method of an audio signal.

The technical scheme provided by the embodiment of the application has the beneficial effects that at least:

In the embodiment of the application, the target noise reference signal is obtained based on one of the first audio signal and the third audio signal and the second audio signal, so that the target noise reference signal comprises the noise signal in the second audio signal and the noise signal in the first audio signal which is not recorded by the second audio signal, and further the noise reduction processing is carried out on the third audio signal based on the target noise reference signal, so that the noise signal in the third audio signal can be effectively removed, and the purity of the target voice signal after noise reduction is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic diagram of an implementation environment of a noise reduction method for an audio signal according to an embodiment of the present application;

fig. 2 is a flowchart of a method for noise reduction of an audio signal according to an embodiment of the present application;

FIG. 3 is a graph of a frequency spectrum of an audio signal according to an embodiment of the present application;

FIG. 4 is a graph of a frequency spectrum of an audio signal according to an embodiment of the present application;

FIG. 5 is a flow chart of a noise reduction process provided by an embodiment of the present application;

FIG. 6 is a flow chart of a noise reduction process provided by an embodiment of the present application;

Fig. 7 is a schematic diagram of a noise reduction device for an audio signal according to an embodiment of the present application;

Fig. 8 is a block diagram of an electronic device according to an embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the embodiments of the present application will be described in further detail with reference to the accompanying drawings.

The terms "first," "second," "third," and "fourth" and the like in the description and in the claims and drawings are used for distinguishing between different objects and not necessarily for describing a particular sequential or chronological order. Furthermore, the terms "comprising," "including," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed steps or elements but may include other steps or elements not listed or inherent to such process, method, article, or apparatus.

The method for reducing noise of the audio signal provided by the embodiment of the application can be applied to electronic equipment, and the electronic equipment can collect the first audio signal and the second audio signal, wherein the first audio signal and the second audio signal are coherent audio signals, and the noise signal of the first audio signal is less than that of the second audio signal.

In some embodiments, the electronic device may include two audio capturing units, where the first audio signal and the second audio signal are recorded by the two audio capturing units, and then, by using the method provided by the embodiment of the present application, noise reduction processing is performed on the first audio signal based on the first audio signal and the second audio signal, so as to obtain the target speech signal. Wherein the audio acquisition component may be a microphone; the electronic device may comprise two microphones, a primary microphone and a secondary microphone, respectively.

Several scenarios of the noise reduction method of the audio signal are described below by way of example.

First: the noise reduction method of the audio signal can be applied to a scene of voice control, and the electronic equipment is a voice control terminal, such as a mobile phone, a landline phone, an interphone and the like; accordingly, referring to fig. 1, an implementation environment of the noise reduction method of an audio signal includes an electronic device 10 and a communication counterpart 20. In the process of performing voice control on the communication opposite terminal 20, the electronic device 10 performs noise reduction according to the method provided by the embodiment of the present application to obtain a target voice signal, and sends the target voice signal to the communication opposite terminal 20, so that the communication opposite terminal executes a command corresponding to the target voice signal, thereby improving the quality of performing voice control on the communication opposite terminal 20 by the electronic device 10.

Second,: the noise reduction method of the audio signal can be applied to a scene of playing audio, and the electronic equipment is a playing terminal, such as a microphone, an earphone and the like; in the process of playing the audio, the electronic equipment performs noise reduction according to the method provided by the embodiment of the application to obtain the target voice signal, and plays the target voice signal, so that the quality of the audio output by the playing terminal is improved.

Third,: the noise reduction method of the audio signal can be applied to a scene of recording audio, and the electronic equipment is a recording terminal, such as a recorder, a recording pen, a video camera and the like; in the process of recording the audio, the electronic equipment carries out noise reduction according to the method provided by the embodiment of the application to obtain the target voice signal, and generates the audio file based on the target voice signal, thereby improving the quality of recording the audio by the recording terminal.

An embodiment of the present application provides a method for noise reduction of an audio signal, where the method steps may be performed by an electronic device, referring to fig. 2, and the method includes:

step 201: a first audio signal and a second audio signal are acquired.

The first audio signal and the second audio signal are audio signals with coherence respectively recorded by the two audio acquisition components. The two audio acquisition components are two audio acquisition components of the same electronic equipment; for example, the two audio capturing components may be a primary microphone and a secondary microphone, respectively.

It should be noted that, the first audio signal and the second audio signal both include a voice signal and a noise signal; the energy of the speech signal in the first audio signal is greater than the energy of the speech signal in the second audio signal, i.e. the speech signal in the first audio signal is greater than the speech signal in the second audio signal. The energy of the noise signal in the second audio signal is greater than the energy of the noise signal in the first audio signal, i.e. the noise signal in the second audio signal is greater than the noise signal in the first audio signal.

Step 202: and carrying out noise reduction processing on the first audio signal based on the first audio signal and the second audio signal to obtain a third audio signal.

This step can be achieved by the following steps (1) - (3):

(1) A second speech signal in the first audio signal is determined.

Wherein the first audio signal comprises a voice signal and a noise signal; and filtering the first audio signal, and eliminating the noise signal in the first audio signal to obtain a second voice signal. The first audio signal is filtered by the first adaptive filter.

(2) And removing the second voice signal in the second audio signal to obtain a second noise signal.

Wherein the second audio signal comprises a voice signal and a noise signal; in the embodiment of the application, after the second voice signal in the first voice signal included in the second voice signal is removed, a clean second noise signal which does not include the voice signal is obtained.

(3) And removing the second noise signal in the first audio signal to obtain a third audio signal.

The first audio signal comprises a voice signal, a second noise signal and other noise signals except the second noise. Filtering the second noise signal to obtain a filtered second noise signal; and removing the second noise signal after the filtering processing from the first audio signal to obtain a third audio signal. The second noise signal is filtered by the second adaptive filter.

It should be noted that, since the first audio signal includes the voice signal, the second noise signal and other noise signals, the second noise signal in the first audio signal is removed, and the third audio signal includes the voice signal and other noise signals.

Referring to fig. 3, the upper half of fig. 3 is a partial spectrogram of a certain first audio signal, and the lower half of fig. 3 is a partial spectrogram of a certain second audio signal. The frame line part in the figure is other noise signals included in the first audio signal and not included in the second audio signal, and it is known that the first audio signal also includes other noise signals not included in the second audio signal, that is, the second audio signal does not include all noise signals in the first audio signal.

Referring to fig. 4, the upper half of fig. 4 is a partial spectrogram of a certain third audio signal, the middle part is a partial spectrogram of a certain first audio signal, and the lower half is a partial spectrogram of a certain second audio signal. The frame line part in the figure is other noise signals which are included in the first audio signal and are not included in the second audio signal, namely, after the second noise signal in the first audio signal is removed, the obtained third audio signal also includes other noise signals.

Referring to fig. 5, fig. 5 is a flowchart of performing noise reduction processing on the first audio signal based on the first audio signal and the second audio signal to obtain a third audio signal. And filtering the first audio signal through a first adaptive filter to eliminate noise signals in the first audio signal and obtain a second voice signal. And removing the second voice signal from the second audio signal to obtain a purer second noise signal. Filtering the second noise signal to obtain a filtered second noise signal; and removing the second noise signal after the filtering processing from the first audio signal to obtain a third audio signal.

In the embodiment of the application, the first audio signal is subjected to noise reduction processing based on the first audio signal and the second audio signal, so that the second noise signal in the first audio signal is removed, the noise signal in the first audio signal is reduced, and the subsequent noise reduction processing of a third noise signal for removing the second noise signal is facilitated.

Step 203: a target noise reference signal is derived based on the second audio signal and one of the first audio signal and the third audio signal.

This step may be accomplished by:

Fusing one of the first audio signal and the third audio signal with the second audio signal to obtain a target noise reference signal; thus, the target noise reference signal includes both the speech signal and the noise signal in the first audio signal and the speech signal and the noise signal in the second audio signal.

In one possible implementation, the third audio signal and the second audio signal are fused to obtain the target noise reference signal. In the embodiment of the application, the third audio signal and the second audio signal are both audio signals comprising voice signals and noise signals, and because the third audio signal is the audio signal for removing part of the noise signals, the third audio signal and the second audio signal are fused to obtain the target noise reference signal, thereby facilitating subsequent rapid and efficient processing of the third audio signal.

In another possible implementation, the first audio signal and the second audio signal are fused to obtain the target noise reference signal. In the embodiment of the application, the first audio signal and the second audio signal are both audio signals comprising voice signals and noise signals, and the first audio signal and the second audio signal are directly fused to obtain the target noise reference signal, so that convenience and trouble are saved.

Wherein fusing the second audio signal and one of the first audio signal and the third audio signal may be achieved by the following steps (1) - (2):

(1) A target fusion coefficient is determined based on the first audio signal, the second audio signal, and the third audio signal.

This step can be achieved by the following steps A1-A3:

a1: and respectively determining a first spectrogram of the first audio signal and a second spectrogram of the second audio signal, wherein the abscissa of the first spectrogram and the second spectrogram is frequency, and the ordinate of the first spectrogram and the second spectrogram is decibel.

The method comprises the steps of carrying out Fourier transform on a first audio signal to generate a first spectrogram, and carrying out Fourier transform on a second audio signal to generate a second spectrogram.

In the embodiment of the application, the coherence between the first audio signal and the second audio signal can be simply and intuitively determined through the first spectrogram and the second spectrogram.

A2: and determining a coherent frequency band in the first spectrogram, wherein the coherent frequency band is a frequency band in which the decibel difference between the coherent frequency band and the second spectrogram in the first spectrogram is smaller than a preset decibel value.

It should be noted that, the preset db value may be set and changed according to the need, and in the embodiment of the present application, the preset db value is set to 5 db.

The first frequency spectrogram and the second frequency spectrogram both comprise a plurality of frequency bands, and the coherent frequency band is a frequency band with a decibel difference between the first frequency spectrogram and the second frequency spectrogram being smaller than a preset decibel value, so that the coherence of the first audio signal and the second audio signal corresponding to the determined coherent frequency band is high.

A3: a target fusion coefficient is determined based on the coherent frequency band, the first audio signal, and the third audio signal.

The method comprises the following two implementation modes:

The first implementation mode: determining that the minimum frequency of the coherent frequency band is not lower than a preset frequency; respectively determining first energy and second energy, wherein the first energy is the energy sum of the first audio signal in the coherent frequency band, and the second energy is the energy sum of the third audio signal in the coherent frequency band; the quotient of the second energy and the first energy is determined as a target fusion coefficient.

It should be noted that, the preset frequency may be set and changed according to the need, and in the embodiment of the present application, the preset frequency is set to 2000Hz.

Performing Fourier transform on the first audio signal in the coherent frequency band to generate a first power spectrum of the first audio signal in the coherent frequency band; and carrying out Fourier transform on the third audio signal in the coherent frequency band to generate a second power spectrum of the third audio signal in the coherent frequency band. Wherein the abscissa of the first power spectrum and the second power spectrum is frequency and the ordinate is power. The energy sum of the first audio signal in the coherent frequency band is the area value covered by the first power spectrum, and the energy sum of the second audio signal in the coherent frequency band is the area value covered by the second power spectrum.

In the embodiment of the application, the second energy is the energy sum of the third audio signal containing other noise signals in the coherent frequency band, and the first energy is the energy sum of the first audio signal containing the voice signal, the second noise signal and the other noise signals in the coherent frequency band, so that the quotient of the second energy and the first energy is taken as a target fusion coefficient, and the target fusion coefficient can represent the weight of the other noise signals in the first audio signal in the coherent frequency band; obviously, if the number of voice signals contained in the coherent frequency band is large, the second noise signals contained in the description is small, and the value of the target fusion coefficient is large; if the coherent frequency band contains more noise signals, the second noise signals are also more, and the value of the target fusion coefficient is smaller.

The second implementation mode: determining that the minimum frequency of the coherent frequency band is lower than a preset frequency; respectively determining first energy, second energy and third energy, wherein the first energy is the energy sum of the first audio signal in a coherent frequency band; the second energy is the energy sum of the third audio signal in the coherent frequency band, the third energy is the energy sum of the first audio signal in the full frequency band, and the full frequency band is the frequency band included by the first spectrogram. The quotient of the second energy and the first energy is determined as a first fusion coefficient. The quotient of the first energy and the third energy is determined as a second fusion coefficient. And determining the average value of the first fusion coefficient and the second fusion coefficient as a target fusion coefficient.

Performing Fourier transform on a first audio signal in a coherent frequency band to generate a first power spectrum of the first audio signal in the coherent frequency band; performing Fourier transform on the third audio signal in the coherent frequency band to generate a second power spectrum of the third audio signal in the coherent frequency band; and carrying out Fourier transform on the first audio signal in the full frequency band to generate a third power spectrum of the first audio signal in the full frequency band. Wherein, the abscissa of the first power spectrum, the second power spectrum and the third power spectrum is frequency, and the ordinate is power. The energy sum of the first audio signal in the coherent frequency band is the area value covered by the first power spectrum, the energy sum of the second audio signal in the coherent frequency band is the area value covered by the second power spectrum, and the energy sum of the third audio signal in the full frequency band is the area value covered by the third power spectrum.

In the embodiment of the application, the first energy is the energy sum of the first audio signal in the coherent frequency band, the third energy is the energy sum of the first audio signal in the full frequency band, and the coherence of the first audio signal and the second audio signal in the coherent frequency band is high, namely the second noise signal in the second audio signal is included; in this way, the quotient of the first energy and the third energy is taken as a second fusion coefficient, so that the second fusion coefficient can represent the weight of a first audio signal containing a second noise signal in a first audio signal of a full frequency band, and the first fusion coefficient represents the weight of other noise signals in the first audio signal of a coherent frequency band, and then the average value of the first fusion coefficient and the second fusion coefficient is taken as a target fusion coefficient, so that the target fusion coefficient represents the weight of all noise signals in the first audio signal; obviously, if the number of voice signals contained in the coherent frequency band is large, the description contains less noise signals, and the value of the target fusion coefficient is large; if the coherent frequency band contains more noise signals, the value of the target fusion coefficient is smaller.

(2) And fusing one of the first audio signal and the third audio signal with the second audio signal based on the target fusion coefficient to obtain a target noise reference signal.

This step is achieved by the following steps A1-A2:

A1: based on the target fusion coefficient, generating first relation data, wherein the first relation data is relation data with parameters being the target fusion coefficient, independent variables being one of the first audio signal and the third audio signal, the second audio signal and dependent variables being target noise reference signals.

The first relationship data is: n _mew＝α*N_old + (1-alpha) S

Wherein N _new is a target noise reference signal, N _old is a second audio signal, S is one audio signal of the first audio signal and the third audio signal, and α is a target fusion coefficient.

A2: and fusing one of the first audio signal and the third audio signal with the second audio signal through the first relation data to obtain a target noise reference signal.

Substituting one audio signal of the first audio signal and the third audio signal and the second audio signal into the first relation data to obtain a target noise reference signal.

It should be noted that, in the embodiment of the present application, the coherent frequency band of the first audio signal may be determined by VAD (Voice Activity Detection, silence suppression) processing. Referring to the first relationship data, if the VAD judges that there is a deviation, if the VAD judges that there is a speech segment, the target fusion coefficient is larger, the proportion of the reserved second audio signal is larger, and the damage to the speech signal of one of the first audio signal and the third audio signal is smaller. If the error judgment is the noise section, the target fusion coefficient is smaller, the proportion of one of the first audio signal and the third audio signal is kept large, and then the noise signal in one of the first audio signal and the third audio signal can be effectively filtered.

In the embodiment of the application, the target noise reference signal is obtained by fusing one of the first audio signal and the third audio signal with the second audio signal; thus, the target noise reference signal comprises the voice signal and the noise signal in the first audio signal and also comprises the voice signal and the noise signal in the second audio signal, so that when the noise reduction processing is carried out on the third audio signal based on the target noise reference signal, all the noise signals in the third audio signal can be effectively removed.

Step 204: and carrying out noise reduction processing on the third audio signal based on the target noise reference signal to obtain a target voice signal.

This step can be achieved by the following steps (1) - (3):

(1) A first speech signal in the third audio signal is determined.

This step can be achieved by the following steps A1-A2:

a1: from the third audio signal, a fourth audio signal is determined.

The fourth audio signal is a signal containing a noise signal in the third audio signal. And performing VAD processing on the third audio signal to obtain a fourth audio signal.

In the embodiment of the application, the long silence period can be identified and eliminated from the third audio signal by performing VAD processing on the third audio signal, so that the effect of saving speech channel resources under the condition of not reducing service quality is achieved. The third audio signal is subjected to VAD processing, and a signal including a noise signal in the third audio signal can be recognized, so that a fourth audio signal including a noise signal can be obtained by performing VAD processing on the third audio signal.

A2: and taking the voice signal in the fourth audio signal as the first voice signal.

The fourth audio signal includes not only a noise signal but also a speech signal. And filtering the fourth audio signal, and eliminating the noise signal in the fourth audio signal to obtain a voice signal in the fourth audio signal as the first voice signal. The fourth audio signal is filtered by the first adaptive filter.

(2) And removing the first voice signal in the target noise reference signal to obtain a first noise signal.

The target noise reference signal comprises a voice signal and a noise signal; in the embodiment of the application, after the first voice signal in the target noise reference signal is removed, a pure first noise signal which does not comprise the voice signal is obtained.

(3) And removing the first noise signal in the third audio signal to obtain a target voice signal.

Wherein the third audio signal includes a speech signal and a first noise signal. Filtering the first noise signal to obtain a filtered first noise signal; and removing the first noise signal after the filtering processing from the third audio signal to obtain a target voice signal. The first noise signal is filtered by the second adaptive filter.

Referring to fig. 6, fig. 6 is a flowchart of performing noise reduction processing on the third audio signal based on the target noise reference signal to obtain a target speech signal. And performing VAD processing on the third audio signal to obtain a fourth audio signal. And filtering the fourth audio signal, and eliminating the noise signal in the fourth audio signal to obtain the first voice signal. And removing the first voice signal to obtain a purer first noise signal. Filtering the first noise signal to obtain a filtered first noise signal; and removing the first noise signal after the filtering processing from the third audio signal to obtain a target voice signal.

In the embodiment of the application, the noise reduction processing is performed on the third audio signal based on the target noise reference signal, and because the target noise reference signal contains all noise signals in the third audio signal, the noise signals in the third audio signal can be removed completely through the target noise reference signal, thereby improving the purity of the target voice signal obtained by noise reduction. On the premise of ensuring the hearing feeling, the noise reduction method for the audio signal provided by the embodiment of the application eliminates noise residues caused by insufficient coherence of two audio signals on the premise of not damaging or less damaging voice, and improves the robustness of the noise reduction method for the audio noise reduction.

Step 205: and performing target operation on the target voice signal.

This step includes several implementations:

In a possible implementation manner, the method for noise reduction of an audio signal provided by the embodiment of the application is applied to a scene of voice control, and a target voice signal is sent to enable a communication opposite terminal to execute a command corresponding to the target voice signal.

In this implementation, the electronic device is a voice control terminal, and the communication peer may be a mobile phone, a robot, or the like. After the electronic equipment sends the target voice signal to the opposite communication terminal, the opposite communication terminal receives the target voice signal, so that the opposite communication terminal executes a command corresponding to the target voice signal, and further the application of the target voice signal obtained by the method provided by the embodiment of the application in voice control is realized.

In another possible implementation manner, the noise reduction method for an audio signal provided by the embodiment of the application is applied to a scene where audio is played, and a target voice signal is played.

In this implementation, the electronic device is a playback terminal, which may be a microphone, an earphone, or the like. After the target voice signal is played, the application of the target voice signal obtained by the method provided by the embodiment of the application on audio playing is realized.

In another possible implementation manner, the method for noise reduction of an audio signal provided by the embodiment of the application is applied to a scene of recording audio, and generates an audio file based on a target voice signal.

In this implementation, the electronic device is a recording terminal, which may be a recording pen, a recorder, a video camera, etc. Based on the target voice signal, an audio file is generated, and the application of the target voice signal obtained by the method provided by the embodiment of the application on audio recording is realized.

In the embodiment of the application, the target noise reference signal is obtained by fusing one of the first audio signal and the third audio signal with the second audio signal, so that the target noise reference signal comprises the noise signal in the second audio signal and the noise signal in the first audio signal which is not recorded by the second audio signal, and further the noise reduction processing is carried out on the third audio signal based on the target noise reference signal, so that the noise signal in the third audio signal can be effectively removed, and the purity of the target voice signal after noise reduction is improved.

An embodiment of the present application provides a noise reduction device for an audio signal, referring to fig. 7, the device includes:

The first processing module 701 is configured to perform noise reduction processing on the first audio signal based on the first audio signal and the second audio signal, so as to obtain a third audio signal, where the first audio signal and the second audio signal are audio signals with coherence; a determining module 702, configured to obtain a target noise reference signal based on one of the first audio signal and the third audio signal and the second audio signal; the second processing module 703 is configured to perform noise reduction processing on the third audio signal based on the target noise reference signal, so as to obtain a target speech signal.

In one possible implementation, the determining module 702 includes: and the fusion unit is used for fusing one of the first audio signal and the third audio signal with the second audio signal to obtain a target noise reference signal.

In one possible implementation, the fusion unit includes: a determining subunit configured to determine a target fusion coefficient based on the first audio signal, the second audio signal, and the third audio signal; and the fusion subunit is used for fusing one of the first audio signal and the third audio signal with the second audio signal based on the target fusion coefficient to obtain a target noise reference signal.

In one possible implementation, the determining subunit is configured to: respectively determining a first spectrogram of a first audio signal and a second spectrogram of a second audio signal, wherein the abscissa of the first spectrogram and the second spectrogram is frequency, and the ordinate of the first spectrogram and the second spectrogram is decibel; determining a coherent frequency band in the first spectrogram, wherein the coherent frequency band is a frequency band with a decibel difference between the coherent frequency band and the second spectrogram being smaller than a preset decibel value; a target fusion coefficient is determined based on the coherent frequency band, the first audio signal, and the third audio signal.

In one possible implementation, the determining subunit is configured to: determining that the minimum frequency of the coherent frequency band is not lower than a preset frequency; respectively determining first energy and second energy, wherein the first energy is the energy sum of the first audio signal in the coherent frequency band, and the second energy is the energy sum of the third audio signal in the coherent frequency band; the quotient of the second energy and the first energy is determined as a target fusion coefficient.

In one possible implementation, the determining subunit is configured to: determining that the minimum frequency of the coherent frequency band is lower than a preset frequency; respectively determining first energy, second energy and third energy, wherein the first energy is the energy sum of the first audio signal in the coherent frequency band, the second energy is the energy sum of the third audio signal in the coherent frequency band, the third energy is the energy sum of the first audio signal in a full frequency band, and the full frequency band is a frequency band included by a first spectrogram; determining the quotient of the second energy and the first energy as a first fusion coefficient; determining the quotient of the first energy and the third energy as a second fusion coefficient; and determining the average value of the first fusion coefficient and the second fusion coefficient as a target fusion coefficient.

In one possible implementation, the fusion subunit is configured to: generating first relation data based on the target fusion coefficient, wherein the first relation data is relation data with parameters being the target fusion coefficient, independent variables being one audio signal and a second audio signal in the first audio signal and the third audio signal and dependent variables being target noise reference signals; and fusing one of the first audio signal and the third audio signal with the second audio signal through the first relation data to obtain a target noise reference signal.

In one possible implementation, the second processing module 703 includes: a determination unit configured to determine a first voice signal in the third audio signal; the first removing unit is used for removing a first voice signal in the target noise reference signal to obtain a first noise signal; and the second removing unit is used for removing the first noise signal in the third audio signal to obtain a target voice signal.

In a possible implementation, the determining unit is configured to: determining a fourth audio signal from the third audio signal, wherein the fourth audio signal is a signal containing a noise signal in the third audio signal; and taking the voice signal in the fourth audio signal as the first voice signal.

In one possible implementation, the first processing module 701 is configured to: determining a second speech signal in the first audio signal; removing a second voice signal in the second audio signal to obtain a second noise signal; and removing the second noise signal in the first audio signal to obtain a third audio signal.

In one possible implementation, the apparatus further includes: the sending module is used for sending the target voice signal and enabling the communication opposite terminal to execute a command corresponding to the target voice signal; the playing module is used for playing the target voice signal; and the generating module is used for generating an audio file based on the target voice signal.

Fig. 8 shows a block diagram of an electronic device 800 provided by an exemplary embodiment of the application. The electronic device 800 may be a portable mobile electronic device such as: a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III, motion picture expert compression standard audio plane 3), an MP4 (Moving Picture Experts Group Audio Layer IV, motion picture expert compression standard audio plane 4) player, a notebook computer, or a desktop computer. Electronic device 800 may also be referred to by other names as user device, portable electronic device, laptop electronic device, desktop electronic device, and the like.

Generally, the electronic device 800 includes: a processor 801 and a memory 802.

Processor 801 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and the like. The processor 801 may be implemented in at least one hardware form of DSP (DIGITAL SIGNAL Processing), FPGA (Field-Programmable gate array), PLA (Programmable Logic Array ). The processor 801 may also include a main processor, which is a processor for processing data in an awake state, also referred to as a CPU (Central Processing Unit ), and a coprocessor; a coprocessor is a low-power processor for processing data in a standby state. In some embodiments, the processor 801 may integrate a GPU (Graphics Processing Unit, image processor) for rendering and drawing of content required to be displayed by the display screen. In some embodiments, the processor 801 may also include an AI (ARTIFICIAL INTELLIGENCE ) processor for processing computing operations related to machine learning.

Memory 802 may include one or more computer-readable storage media, which may be non-transitory. Memory 802 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 802 is used to store at least one instruction for execution by processor 801 to implement the method of noise reduction of an audio signal provided by an embodiment of the method of the present application.

In some embodiments, the electronic device 800 may further optionally include: a peripheral interface 803, and at least one peripheral. The processor 801, the memory 802, and the peripheral interface 803 may be connected by a bus or signal line. Individual peripheral devices may be connected to the peripheral device interface 803 by buses, signal lines, or a circuit board. Specifically, the peripheral device includes: at least one of radio frequency circuitry 804, a display 805, a camera assembly 806, audio circuitry 807, a positioning assembly 808, and a power supply 809.

Peripheral interface 803 may be used to connect at least one Input/Output (I/O) related peripheral to processor 801 and memory 802. In some embodiments, processor 801, memory 802, and peripheral interface 803 are integrated on the same chip or circuit board; in some other embodiments, either or both of the processor 801, the memory 802, and the peripheral interface 803 may be implemented on separate chips or circuit boards, which is not limited in this embodiment.

The Radio Frequency circuit 804 is configured to receive and transmit RF (Radio Frequency) signals, also known as electromagnetic signals. The radio frequency circuit 804 communicates with a communication network and other communication devices via electromagnetic signals. The radio frequency circuit 804 converts an electrical signal into an electromagnetic signal for transmission, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 804 includes: antenna systems, RF transceivers, one or more amplifiers, tuners, oscillators, digital signal processors, codec chipsets, subscriber identity module cards, and so forth. The radio frequency circuitry 804 may communicate with other electronic devices via at least one wireless communication protocol. The wireless communication protocol includes, but is not limited to: the world wide web, metropolitan area networks, intranets, generation mobile communication networks (2G, 3G, 4G, and 5G), wireless local area networks, and/or WiFi (WIRELESS FIDELITY ) networks. In some embodiments, the radio frequency circuit 804 may further include NFC (NEAR FIELD Communication) related circuits, which is not limited by the present application.

The display 805 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display 805 is a touch display, the display 805 also has the ability to collect touch signals at or above the surface of the display 805. The touch signal may be input as a control signal to the processor 801 for processing. At this time, the display 805 may also be used to provide virtual buttons and/or virtual keyboards, also referred to as soft buttons and/or soft keyboards. In some embodiments, the display 805 may be one and disposed on a front panel of the electronic device 800; in other embodiments, the display 805 may be at least two, respectively disposed on different surfaces of the electronic device 800 or in a folded design; in other embodiments, the display 805 may be a flexible display disposed on a curved surface or a folded surface of the electronic device 800. Even more, the display 805 may be arranged in an irregular pattern other than rectangular, i.e., a shaped screen. The display 805 may be made of LCD (Liquid CRYSTAL DISPLAY), OLED (Organic Light-Emitting Diode), or other materials.

The camera assembly 806 is used to capture images or video. Optionally, the camera assembly 806 includes a front camera and a rear camera. In general, a front camera is disposed on a front panel of an electronic device, and a rear camera is disposed on a rear surface of the electronic device. In some embodiments, the at least two rear cameras are any one of a main camera, a depth camera, a wide-angle camera and a tele camera, so as to realize that the main camera and the depth camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize a panoramic shooting and Virtual Reality (VR) shooting function or other fusion shooting functions. In some embodiments, the camera assembly 806 may also include a flash. The flash lamp can be a single-color temperature flash lamp or a double-color temperature flash lamp. The dual-color temperature flash lamp refers to a combination of a warm light flash lamp and a cold light flash lamp, and can be used for light compensation under different color temperatures.

Audio circuitry 807 may include a microphone and a speaker. The microphone is used for collecting sound waves of users and the environment, converting the sound waves into electric signals, inputting the electric signals to the processor 801 for processing, or inputting the electric signals to the radio frequency circuit 804 for voice communication. For purposes of stereo acquisition or noise reduction, the microphone may be multiple and separately disposed at different locations of the electronic device 800. The microphone may also be an array microphone or an omni-directional pickup microphone. The speaker is used to convert electrical signals from the processor 801 or the radio frequency circuit 804 into sound waves. The speaker may be a conventional thin film speaker or a piezoelectric ceramic speaker. When the speaker is a piezoelectric ceramic speaker, not only the electric signal can be converted into a sound wave audible to humans, but also the electric signal can be converted into a sound wave inaudible to humans for ranging and other purposes. In some embodiments, audio circuit 807 may also include a headphone jack.

The location component 808 is utilized to locate the current geographic location of the electronic device 800 for navigation or LBS (Location Based Service, location-based services). The positioning component 808 may be a positioning component based on the United states GPS (Global Positioning System ), the Beidou system of China, or the Galileo system of Russia.

The power supply 809 is used to power the various components in the electronic device 800. The power supply 809 may be an alternating current, direct current, disposable battery, or rechargeable battery. When the power supply 809 includes a rechargeable battery, the rechargeable battery may be a wired rechargeable battery or a wireless rechargeable battery. The wired rechargeable battery is a battery charged through a wired line, and the wireless rechargeable battery is a battery charged through a wireless coil. The rechargeable battery may also be used to support fast charge technology.

In some embodiments, the electronic device 800 also includes one or more sensors 810. The one or more sensors 810 include, but are not limited to: acceleration sensor 811, gyroscope sensor 812, pressure sensor 813, fingerprint sensor 814, optical sensor 815, and proximity sensor 816.

The acceleration sensor 811 can detect the magnitudes of accelerations on three coordinate axes of the coordinate system established with the electronic device 800. For example, the acceleration sensor 811 may be used to detect components of gravitational acceleration in three coordinate axes. The processor 801 may control the display screen 805 to display a user interface in a landscape view or a portrait view based on the gravitational acceleration signal acquired by the acceleration sensor 811. Acceleration sensor 811 may also be used for the acquisition of motion data of a game or user.

The gyro sensor 812 may detect a body direction and a rotation angle of the electronic device 800, and the gyro sensor 812 may collect a 3D motion of the user on the electronic device 800 in cooperation with the acceleration sensor 811. The processor 801 may implement the following functions based on the data collected by the gyro sensor 812: motion sensing (e.g., changing UI according to a tilting operation by a user), image stabilization at shooting, game control, and inertial navigation.

The pressure sensor 813 may be disposed at a side frame of the electronic device 800 and/or at an underlying layer of the display 805. When the pressure sensor 813 is disposed on a side frame of the electronic device 800, a grip signal of the electronic device 800 by a user may be detected, and the processor 801 performs left-right hand recognition or shortcut operation according to the grip signal collected by the pressure sensor 813. When the pressure sensor 813 is disposed at the lower layer of the display screen 805, the processor 801 controls the operability control on the UI interface according to the pressure operation of the user on the display screen 805. The operability controls include at least one of a button control, a scroll bar control, an icon control, and a menu control.

The fingerprint sensor 814 is used to collect a fingerprint of a user, and the processor 801 identifies the identity of the user based on the fingerprint collected by the fingerprint sensor 814, or the fingerprint sensor 814 identifies the identity of the user based on the collected fingerprint. Upon recognizing that the user's identity is a trusted identity, the processor 801 authorizes the user to perform relevant sensitive operations including unlocking the screen, viewing encrypted information, downloading software, paying for and changing settings, etc. The fingerprint sensor 814 may be disposed on the front, back, or side of the electronic device 800. When a physical key or vendor Logo is provided on the electronic device 800, the fingerprint sensor 814 may be integrated with the physical key or vendor Logo.

The optical sensor 815 is used to collect the ambient light intensity. In one embodiment, the processor 801 may control the display brightness of the display screen 805 based on the intensity of ambient light collected by the optical sensor 815. Specifically, when the intensity of the ambient light is high, the display brightness of the display screen 805 is turned up; when the ambient light intensity is low, the display brightness of the display screen 805 is turned down. In another embodiment, the processor 801 may also dynamically adjust the shooting parameters of the camera module 806 based on the ambient light intensity collected by the optical sensor 815.

A proximity sensor 816, also referred to as a distance sensor, is typically provided on the front panel of the electronic device 800. The proximity sensor 816 is used to collect the distance between the user and the front of the electronic device 800. In one embodiment, when the proximity sensor 816 detects that the distance between the user and the front of the electronic device 800 gradually decreases, the processor 801 controls the display 805 to switch from the bright screen state to the off screen state; when the proximity sensor 816 detects that the distance between the user and the front surface of the electronic device 800 gradually increases, the processor 801 controls the display 805 to switch from the off-screen state to the on-screen state.

Those skilled in the art will appreciate that the structure shown in fig. 8 is not limiting and that more or fewer components than shown may be included or certain components may be combined or a different arrangement of components may be employed.

An embodiment of the present application provides a computer-readable storage medium having at least one program code stored therein, the at least one program code being loaded and executed by a processor to implement operations performed by a noise reduction method for an audio signal as described above.

Embodiments of the present application provide a computer program product or a computer program comprising computer program code stored in a computer readable storage medium. The processor of the electronic device reads the computer program code from the computer readable storage medium and executes the computer program code such that the electronic device performs the operations performed by the above-described noise reduction method of an audio signal.

In some embodiments, a computer program according to an embodiment of the present application may be deployed to be executed on one electronic device, or on a plurality of electronic devices located at one site, or on a plurality of electronic devices distributed at a plurality of sites and interconnected by a communication network, where the plurality of electronic devices distributed at the plurality of sites and interconnected by the communication network may constitute a blockchain system.

The foregoing description of the preferred embodiments of the present application is not intended to limit the application, but rather, the application is to be construed as limited to the appended claims.

Claims

1. A method of noise reduction of an audio signal, comprising:

Determining a second speech signal in the first audio signal; removing the second voice signal in the second audio signal to obtain a second noise signal; removing the second noise signal in the first audio signal to obtain a third audio signal, wherein the first audio signal and the second audio signal are audio signals with coherence, the first audio signal and the second audio signal are acquired based on two audio acquisition components of the same electronic device, the energy of the voice signal in the first audio signal is larger than the energy of the voice signal in the second audio signal, the energy of the noise signal in the second audio signal is larger than the energy of the noise signal in the first audio signal, and the third audio signal comprises the voice signal in the first audio signal and comprises other noise signals except the second noise signal in the second audio signal;

Determining a target fusion coefficient based on the first audio signal, the second audio signal, and the third audio signal; fusing one of the first audio signal and the third audio signal with the second audio signal based on the target fusion coefficient to obtain a target noise reference signal, wherein the target noise reference signal comprises a voice signal and a noise signal in the first audio signal and comprises a voice signal and a noise signal in the second audio signal;

and carrying out noise reduction processing on the third audio signal based on the target noise reference signal so as to remove other noise signals in the third audio signal and obtain a target voice signal.

2. The method of noise reduction of an audio signal according to claim 1, wherein the determining a target fusion coefficient based on the first audio signal, the second audio signal, and the third audio signal comprises:

Respectively determining a first spectrogram of the first audio signal and a second spectrogram of the second audio signal, wherein the abscissa of the first spectrogram and the second spectrogram is frequency, and the ordinate of the first spectrogram and the second spectrogram is decibel;

Determining a coherent frequency band in the first spectrogram, wherein the coherent frequency band is a frequency band in which the decibel difference between the coherent frequency band and the second spectrogram in the first spectrogram is smaller than a preset decibel value;

The target fusion coefficient is determined based on the coherent frequency band, the first audio signal, and the third audio signal.

3. The method of noise reduction of an audio signal according to claim 2, wherein said determining the target fusion coefficient based on the coherent frequency band, the first audio signal and the third audio signal comprises:

determining that the minimum frequency of the coherent frequency band is not lower than a preset frequency;

Respectively determining first energy and second energy, wherein the first energy is the energy sum of the first audio signal in the coherent frequency band, and the second energy is the energy sum of the third audio signal in the coherent frequency band;

determining a quotient of the second energy and the first energy as the target fusion coefficient.

4. The method of noise reduction of an audio signal according to claim 2, wherein said determining the target fusion coefficient based on the coherent frequency band, the first audio signal and the third audio signal comprises:

determining that the minimum frequency of the coherent frequency band is lower than a preset frequency;

respectively determining first energy, second energy and third energy, wherein the first energy is the energy sum of the first audio signal in the coherent frequency band, the second energy is the energy sum of the third audio signal in the coherent frequency band, the third energy is the energy sum of the first audio signal in a full frequency band, and the full frequency band is a frequency band included by the first spectrogram;

Determining a quotient of the second energy and the first energy as a first fusion coefficient;

Determining a quotient of the first energy and the third energy as a second fusion coefficient;

and determining the average value of the first fusion coefficient and the second fusion coefficient as the target fusion coefficient.

5. The method for noise reduction of an audio signal according to claim 1, wherein the fusing one of the first audio signal and the third audio signal with the second audio signal based on the target fusion coefficient to obtain a target noise reference signal comprises:

Generating first relationship data based on the target fusion coefficient; the first relation data are relation data with parameters of the target fusion coefficient, independent variables of the first audio signal and the third audio signal, one audio signal of the first audio signal and the third audio signal, the second audio signal and the dependent variables of the target noise reference signal;

and fusing one of the first audio signal and the third audio signal with the second audio signal through the first relation data to obtain the target noise reference signal.

6. The method for noise reduction of an audio signal according to any one of claims 1 to 5, wherein the noise reduction processing is performed on the third audio signal based on the target noise reference signal to remove other noise signals in the third audio signal, so as to obtain a target speech signal, and the method comprises:

determining a first speech signal in the third audio signal;

Removing the first voice signal in the target noise reference signal to obtain a first noise signal;

And removing the first noise signal in the third audio signal to obtain the target voice signal.

7. The method of noise reduction of an audio signal according to claim 6, wherein said determining a first speech signal in the third audio signal comprises:

determining a fourth audio signal from the third audio signal, wherein the fourth audio signal is a signal containing a noise signal in the third audio signal;

And taking the voice signal in the fourth audio signal as the first voice signal.

8. The method of noise reduction of an audio signal according to claim 1, further comprising:

Transmitting the target voice signal to enable the communication opposite terminal to execute a command corresponding to the target voice signal; or alternatively

Playing the target voice signal; or alternatively

An audio file is generated based on the target speech signal.

9. A noise reduction device for an audio signal, the device comprising:

A first processing module for determining a second speech signal in the first audio signal; removing the second voice signal in the second audio signal to obtain a second noise signal; removing the second noise signal in the first audio signal to obtain a third audio signal, wherein the first audio signal and the second audio signal are audio signals with coherence, the first audio signal and the second audio signal are acquired based on two audio acquisition components of the same electronic device, the energy of the voice signal in the first audio signal is larger than the energy of the voice signal in the second audio signal, the energy of the noise signal in the second audio signal is larger than the energy of the noise signal in the first audio signal, and the third audio signal comprises the voice signal in the first audio signal and comprises other noise signals except the second noise signal in the second audio signal;

A determining module configured to determine a target fusion coefficient based on the first audio signal, the second audio signal, and the third audio signal; fusing one of the first audio signal and the third audio signal with the second audio signal based on the target fusion coefficient to obtain a target noise reference signal, wherein the target noise reference signal comprises a voice signal and a noise signal in the first audio signal and comprises a voice signal and a noise signal in the second audio signal;

And the second processing module is used for carrying out noise reduction processing on the third audio signal based on the target noise reference signal so as to remove other noise signals in the third audio signal and obtain a target voice signal.

10. An electronic device comprising one or more processors and one or more memories, the one or more memories having stored therein at least one program code loaded and executed by the one or more processors to perform the operations performed by the method of noise reduction of an audio signal as claimed in any of claims 1 to 8.

11. A computer readable storage medium having stored therein at least one program code, the at least one program code being loaded and executed by a processor to implement the operations performed by the method of noise reduction of an audio signal as claimed in any one of claims 1 to 8.