[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN113539291B - Noise reduction method and device for audio signal, electronic equipment and storage medium - Google Patents

Noise reduction method and device for audio signal, electronic equipment and storage medium Download PDF

Info

Publication number
CN113539291B
CN113539291B CN202110777962.8A CN202110777962A CN113539291B CN 113539291 B CN113539291 B CN 113539291B CN 202110777962 A CN202110777962 A CN 202110777962A CN 113539291 B CN113539291 B CN 113539291B
Authority
CN
China
Prior art keywords
audio signal
signal
noise
target
audio
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110777962.8A
Other languages
Chinese (zh)
Other versions
CN113539291A (en
Inventor
李良斌
陈孝良
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing SoundAI Technology Co Ltd
Original Assignee
Beijing SoundAI Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing SoundAI Technology Co Ltd filed Critical Beijing SoundAI Technology Co Ltd
Priority to CN202110777962.8A priority Critical patent/CN113539291B/en
Publication of CN113539291A publication Critical patent/CN113539291A/en
Application granted granted Critical
Publication of CN113539291B publication Critical patent/CN113539291B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0232Processing in the frequency domain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0264Noise filtering characterised by the type of parameter measurement, e.g. correlation techniques, zero crossing techniques or predictive techniques
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/72Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
    • H04M1/724User interfaces specially adapted for cordless or mobile telephones
    • H04M1/72448User interfaces specially adapted for cordless or mobile telephones with means for adapting the functionality of the device according to specific conditions
    • H04M1/72454User interfaces specially adapted for cordless or mobile telephones with means for adapting the functionality of the device according to specific conditions according to context-related or environment-related conditions
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02165Two microphones, one receiving mainly the noise signal and the other one mainly the speech signal

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Environmental & Geological Engineering (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

The application provides a noise reduction method and device for an audio signal, electronic equipment and a storage medium, and belongs to the technical field of audio processing. The method comprises the following steps: noise reduction processing is carried out on the first audio signal based on the first audio signal and the second audio signal to obtain a third audio signal, wherein the first audio signal and the second audio signal are audio signals with coherence; obtaining a target noise reference signal based on the second audio signal and one of the first audio signal and the third audio signal; and carrying out noise reduction processing on the third audio signal based on the target noise reference signal to obtain a target voice signal. The method improves the purity of the target voice signal after noise reduction.

Description

Noise reduction method and device for audio signal, electronic equipment and storage medium
Technical Field
The present application relates to the field of audio processing technologies, and in particular, to a method and apparatus for noise reduction of an audio signal, an electronic device, and a storage medium.
Background
Dual microphone noise reduction techniques are commonly employed by electronic devices to reduce noise in recorded audio signals. The electronic equipment adopting the dual-microphone noise reduction technology is generally provided with a main microphone and a secondary microphone, and audio signals recorded by the main microphone and the secondary microphone at the same time have coherence; in addition, the audio signals recorded by the primary microphone have more voice signals and fewer noise signals, and the audio signals recorded by the secondary microphone have more noise signals and fewer voice signals. Thus, the difference between the audio signals of the two microphones may be utilized to reduce the noise of the audio signal recorded by the primary microphone.
In the related art, a voice signal recorded by a secondary microphone is determined based on an audio signal recorded by a primary microphone, the voice signal is removed from the audio signal recorded by the secondary microphone, a noise signal in the audio signal recorded by the secondary microphone is obtained, and the noise signal is removed from the audio signal recorded by the primary microphone, so that a noise-reduced voice signal is obtained.
In the related art, because noise signals which are not recorded by the auxiliary microphone exist in the audio signals recorded by the main microphone, the noise signals in the audio signals recorded by the main microphone cannot be removed based on the method, and further the noise signals exist in the obtained noise-reduced voice signals, so that the purity of the noise-reduced voice signals is lower.
Disclosure of Invention
The embodiment of the application provides a noise reduction method, a device, electronic equipment and a storage medium for audio signals, wherein a target noise reference signal determined based on one of a first audio signal and a third audio signal and a second audio signal not only comprises a noise signal in the second audio signal, but also comprises a noise signal in a first audio signal which is not recorded by the second audio signal, so that noise reduction processing is carried out on the third audio signal based on the target noise reference signal, the noise signal in the third audio signal can be effectively removed, and the purity of a noise-reduced voice signal can be improved.
In one aspect, there is provided a method of noise reduction of an audio signal, the method comprising: noise reduction processing is carried out on the first audio signal based on the first audio signal and the second audio signal to obtain a third audio signal, wherein the first audio signal and the second audio signal are audio signals with coherence; obtaining a target noise reference signal based on the second audio signal and one of the first audio signal and the third audio signal; and carrying out noise reduction processing on the third audio signal based on the target noise reference signal to obtain a target voice signal.
In some embodiments, the first audio signal and the second audio signal are respectively audio signals with coherence that are recorded by two audio acquisition components.
In some embodiments, the noise signal of the first audio signal is less than the noise signal of the second audio signal.
In one possible implementation manner, the obtaining a target noise reference signal based on the second audio signal and one of the first audio signal and the third audio signal includes: and fusing one of the first audio signal and the third audio signal with the second audio signal to obtain the target noise reference signal.
In one possible implementation manner, the fusing one of the first audio signal and the third audio signal with the second audio signal to obtain the target noise reference signal includes: determining a target fusion coefficient based on the first audio signal, the second audio signal, and the third audio signal; and fusing one of the first audio signal and the third audio signal with the second audio signal based on the target fusion coefficient to obtain the target noise reference signal.
In one possible implementation, the determining the target fusion coefficient based on the first audio signal, the second audio signal, and the third audio signal includes: respectively determining a first spectrogram of the first audio signal and a second spectrogram of the second audio signal, wherein the abscissa of the first spectrogram and the second spectrogram is frequency, and the ordinate of the first spectrogram and the second spectrogram is decibel; determining a coherent frequency band in the first spectrogram, wherein the coherent frequency band is a frequency band in which the decibel difference between the coherent frequency band and the second spectrogram in the first spectrogram is smaller than a preset decibel value; the target fusion coefficient is determined based on the coherent frequency band, the first audio signal, and the third audio signal.
In one possible implementation manner, the determining the target fusion coefficient based on the coherent frequency band, the first audio signal and the third audio signal includes: determining that the minimum frequency of the coherent frequency band is not lower than a preset frequency; the first energy is the energy sum of the first audio signal in the coherent frequency band, and the second energy is the energy sum of the third audio signal in the coherent frequency band; determining a quotient of the second energy and the first energy as the target fusion coefficient.
In one possible implementation manner, the determining the target fusion coefficient based on the coherent frequency band, the first audio signal and the third audio signal includes: determining that the minimum frequency of the coherent frequency band is lower than a preset frequency; respectively determining first energy, second energy and third energy, wherein the first energy is the energy sum of the first audio signal in the coherent frequency band, the second energy is the energy sum of the third audio signal in the coherent frequency band, the third energy is the energy sum of the first audio signal in a full frequency band, and the full frequency band is a frequency band included by the first spectrogram; determining a quotient of the second energy and the first energy as a first fusion coefficient; determining a quotient of the first energy and the third energy as a second fusion coefficient; and determining the average value of the first fusion coefficient and the second fusion coefficient as the target fusion coefficient.
In one possible implementation manner, the fusing, based on the target fusion coefficient, one of the first audio signal and the third audio signal and the second audio signal to obtain the target noise reference signal includes: generating first relation data based on the target fusion coefficient, wherein the first relation data is relation data with parameters of the target fusion coefficient, independent variables of one audio signal of the first audio signal and the third audio signal, the second audio signal and the dependent variables of the target noise reference signal; and fusing one of the first audio signal and the third audio signal with the second audio signal through the first relation data to obtain the target noise reference signal.
In one possible implementation manner, the performing noise reduction processing on the third audio signal based on the target noise reference signal to obtain a target voice signal includes: determining a first speech signal in the third audio signal; removing the first voice signal in the target noise reference signal to obtain a first noise signal; and removing the first noise signal in the third audio signal to obtain the target voice signal.
In one possible implementation, the determining the first speech signal in the third audio signal includes: determining a fourth audio signal from the third audio signal, wherein the fourth audio signal is a signal containing a noise signal in the third audio signal; and taking the voice signal in the fourth audio signal as the first voice signal.
In one possible implementation manner, the noise reduction processing is performed on the first audio signal based on the first audio signal and the second audio signal to obtain a third audio signal, including: determining a second speech signal in the first audio signal; removing the second voice signal in the second audio signal to obtain a second noise signal; and removing the second noise signal in the first audio signal to obtain the third audio signal.
In one possible implementation, the method further includes: transmitting the target voice signal to enable the communication opposite terminal to execute a command corresponding to the target voice signal; or playing the target voice signal; or generating an audio file based on the target speech signal.
In another aspect, there is provided a noise reduction apparatus for an audio signal, the apparatus comprising:
The first processing module is used for carrying out noise reduction processing on the first audio signal based on the first audio signal and the second audio signal to obtain a third audio signal, wherein the first audio signal and the second audio signal are audio signals with coherence; a determining module for obtaining a target noise reference signal based on the second audio signal and one of the first audio signal and the third audio signal; and the second processing module is used for carrying out noise reduction processing on the third audio signal based on the target noise reference signal to obtain a target voice signal.
In some possible embodiments, the first audio signal and the second audio signal are respectively audio signals with coherence recorded by the two audio acquisition components.
In some possible embodiments, the noise signal of the first audio signal is less than the noise signal of the second audio signal.
In one possible implementation, the determining module includes: and the fusion unit is used for fusing one of the first audio signal and the third audio signal with the second audio signal to obtain the target noise reference signal.
In one possible implementation, the fusion unit includes: a determining subunit configured to determine a target fusion coefficient based on the first audio signal, the second audio signal, and the third audio signal; and the fusion subunit is used for fusing one of the first audio signal and the third audio signal with the second audio signal based on the target fusion coefficient to obtain the target noise reference signal.
In a possible implementation manner, the determining subunit is configured to: respectively determining a first spectrogram of the first audio signal and a second spectrogram of the second audio signal, wherein the abscissa of the first spectrogram and the second spectrogram is frequency, and the ordinate of the first spectrogram and the second spectrogram is decibel; determining a coherent frequency band in the first spectrogram, wherein the coherent frequency band is a frequency band in which the decibel difference between the coherent frequency band and the second spectrogram in the first spectrogram is smaller than a preset decibel value; the target fusion coefficient is determined based on the coherent frequency band, the first audio signal, and the third audio signal.
In a possible implementation manner, the determining subunit is configured to: determining that the minimum frequency of the coherent frequency band is not lower than a preset frequency; respectively determining first energy and second energy, wherein the first energy is the energy sum of the first audio signal in the coherent frequency band, and the second energy is the energy sum of the third audio signal in the coherent frequency band; determining a quotient of the second energy and the first energy as the target fusion coefficient.
In a possible implementation manner, the determining subunit is configured to: determining that the minimum frequency of the coherent frequency band is lower than a preset frequency; respectively determining first energy, second energy and third energy, wherein the first energy is the energy sum of the first audio signal in the coherent frequency band, the second energy is the energy sum of the third audio signal in the coherent frequency band, the third energy is the energy sum of the first audio signal in a full frequency band, and the full frequency band is a frequency band included by the first spectrogram; determining a quotient of the second energy and the first energy as a first fusion coefficient; determining a quotient of the first energy and the third energy as a second fusion coefficient; and determining the average value of the first fusion coefficient and the second fusion coefficient as the target fusion coefficient.
In one possible implementation, the fusion subunit is configured to: generating first relation data based on the target fusion coefficient, wherein the first relation data is relation data with parameters of the target fusion coefficient, independent variables of one audio signal of the first audio signal and the third audio signal, the second audio signal and the dependent variables of the target noise reference signal; and fusing one of the first audio signal and the third audio signal with the second audio signal through the first relation data to obtain the target noise reference signal.
In one possible implementation manner, the second processing module includes: a determining unit configured to determine a first speech signal in the third audio signal; a first removing unit, configured to remove the first speech signal in the target noise reference signal, to obtain a first noise signal; and the second removing unit is used for removing the first noise signal in the third audio signal to obtain the target voice signal.
In a possible implementation manner, the determining unit is configured to: determining a fourth audio signal from the third audio signal, wherein the fourth audio signal is a signal containing a noise signal in the third audio signal; and taking the voice signal in the fourth audio signal as the first voice signal.
In one possible implementation manner, the first processing module is configured to: determining a second speech signal in the first audio signal; removing the second voice signal in the second audio signal to obtain a second noise signal; and removing the second noise signal in the first audio signal to obtain the third audio signal.
In one possible implementation, the apparatus further includes: the sending module is used for sending the target voice signal to enable the communication opposite terminal to execute a command corresponding to the target voice signal; the playing module is used for playing the target voice signal; and the generating module is used for generating an audio file based on the target voice signal.
In another aspect, an electronic device is provided that includes one or more processors and one or more memories having stored therein at least one program code loaded and executed by the one or more processors to perform operations as performed by the method of noise reduction of an audio signal as described above.
In another aspect, a computer readable storage medium having stored therein at least one program code loaded and executed by a processor to perform operations performed by a method for noise reduction of an audio signal as described above is provided.
In another aspect, a computer program product or a computer program is provided, the computer program product or the computer program comprising computer program code, the computer program code being stored in a computer readable storage medium. The processor of the electronic device reads the computer program code from the computer readable storage medium and executes the computer program code such that the electronic device performs the operations performed by the above-described noise reduction method of an audio signal.
The technical scheme provided by the embodiment of the application has the beneficial effects that at least:
In the embodiment of the application, the target noise reference signal is obtained based on one of the first audio signal and the third audio signal and the second audio signal, so that the target noise reference signal comprises the noise signal in the second audio signal and the noise signal in the first audio signal which is not recorded by the second audio signal, and further the noise reduction processing is carried out on the third audio signal based on the target noise reference signal, so that the noise signal in the third audio signal can be effectively removed, and the purity of the target voice signal after noise reduction is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic diagram of an implementation environment of a noise reduction method for an audio signal according to an embodiment of the present application;
fig. 2 is a flowchart of a method for noise reduction of an audio signal according to an embodiment of the present application;
FIG. 3 is a graph of a frequency spectrum of an audio signal according to an embodiment of the present application;
FIG. 4 is a graph of a frequency spectrum of an audio signal according to an embodiment of the present application;
FIG. 5 is a flow chart of a noise reduction process provided by an embodiment of the present application;
FIG. 6 is a flow chart of a noise reduction process provided by an embodiment of the present application;
Fig. 7 is a schematic diagram of a noise reduction device for an audio signal according to an embodiment of the present application;
Fig. 8 is a block diagram of an electronic device according to an embodiment of the present application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the embodiments of the present application will be described in further detail with reference to the accompanying drawings.
The terms "first," "second," "third," and "fourth" and the like in the description and in the claims and drawings are used for distinguishing between different objects and not necessarily for describing a particular sequential or chronological order. Furthermore, the terms "comprising," "including," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed steps or elements but may include other steps or elements not listed or inherent to such process, method, article, or apparatus.
The method for reducing noise of the audio signal provided by the embodiment of the application can be applied to electronic equipment, and the electronic equipment can collect the first audio signal and the second audio signal, wherein the first audio signal and the second audio signal are coherent audio signals, and the noise signal of the first audio signal is less than that of the second audio signal.
In some embodiments, the electronic device may include two audio capturing units, where the first audio signal and the second audio signal are recorded by the two audio capturing units, and then, by using the method provided by the embodiment of the present application, noise reduction processing is performed on the first audio signal based on the first audio signal and the second audio signal, so as to obtain the target speech signal. Wherein the audio acquisition component may be a microphone; the electronic device may comprise two microphones, a primary microphone and a secondary microphone, respectively.
Several scenarios of the noise reduction method of the audio signal are described below by way of example.
First: the noise reduction method of the audio signal can be applied to a scene of voice control, and the electronic equipment is a voice control terminal, such as a mobile phone, a landline phone, an interphone and the like; accordingly, referring to fig. 1, an implementation environment of the noise reduction method of an audio signal includes an electronic device 10 and a communication counterpart 20. In the process of performing voice control on the communication opposite terminal 20, the electronic device 10 performs noise reduction according to the method provided by the embodiment of the present application to obtain a target voice signal, and sends the target voice signal to the communication opposite terminal 20, so that the communication opposite terminal executes a command corresponding to the target voice signal, thereby improving the quality of performing voice control on the communication opposite terminal 20 by the electronic device 10.
Second,: the noise reduction method of the audio signal can be applied to a scene of playing audio, and the electronic equipment is a playing terminal, such as a microphone, an earphone and the like; in the process of playing the audio, the electronic equipment performs noise reduction according to the method provided by the embodiment of the application to obtain the target voice signal, and plays the target voice signal, so that the quality of the audio output by the playing terminal is improved.
Third,: the noise reduction method of the audio signal can be applied to a scene of recording audio, and the electronic equipment is a recording terminal, such as a recorder, a recording pen, a video camera and the like; in the process of recording the audio, the electronic equipment carries out noise reduction according to the method provided by the embodiment of the application to obtain the target voice signal, and generates the audio file based on the target voice signal, thereby improving the quality of recording the audio by the recording terminal.
An embodiment of the present application provides a method for noise reduction of an audio signal, where the method steps may be performed by an electronic device, referring to fig. 2, and the method includes:
step 201: a first audio signal and a second audio signal are acquired.
The first audio signal and the second audio signal are audio signals with coherence respectively recorded by the two audio acquisition components. The two audio acquisition components are two audio acquisition components of the same electronic equipment; for example, the two audio capturing components may be a primary microphone and a secondary microphone, respectively.
It should be noted that, the first audio signal and the second audio signal both include a voice signal and a noise signal; the energy of the speech signal in the first audio signal is greater than the energy of the speech signal in the second audio signal, i.e. the speech signal in the first audio signal is greater than the speech signal in the second audio signal. The energy of the noise signal in the second audio signal is greater than the energy of the noise signal in the first audio signal, i.e. the noise signal in the second audio signal is greater than the noise signal in the first audio signal.
Step 202: and carrying out noise reduction processing on the first audio signal based on the first audio signal and the second audio signal to obtain a third audio signal.
This step can be achieved by the following steps (1) - (3):
(1) A second speech signal in the first audio signal is determined.
Wherein the first audio signal comprises a voice signal and a noise signal; and filtering the first audio signal, and eliminating the noise signal in the first audio signal to obtain a second voice signal. The first audio signal is filtered by the first adaptive filter.
(2) And removing the second voice signal in the second audio signal to obtain a second noise signal.
Wherein the second audio signal comprises a voice signal and a noise signal; in the embodiment of the application, after the second voice signal in the first voice signal included in the second voice signal is removed, a clean second noise signal which does not include the voice signal is obtained.
(3) And removing the second noise signal in the first audio signal to obtain a third audio signal.
The first audio signal comprises a voice signal, a second noise signal and other noise signals except the second noise. Filtering the second noise signal to obtain a filtered second noise signal; and removing the second noise signal after the filtering processing from the first audio signal to obtain a third audio signal. The second noise signal is filtered by the second adaptive filter.
It should be noted that, since the first audio signal includes the voice signal, the second noise signal and other noise signals, the second noise signal in the first audio signal is removed, and the third audio signal includes the voice signal and other noise signals.
Referring to fig. 3, the upper half of fig. 3 is a partial spectrogram of a certain first audio signal, and the lower half of fig. 3 is a partial spectrogram of a certain second audio signal. The frame line part in the figure is other noise signals included in the first audio signal and not included in the second audio signal, and it is known that the first audio signal also includes other noise signals not included in the second audio signal, that is, the second audio signal does not include all noise signals in the first audio signal.
Referring to fig. 4, the upper half of fig. 4 is a partial spectrogram of a certain third audio signal, the middle part is a partial spectrogram of a certain first audio signal, and the lower half is a partial spectrogram of a certain second audio signal. The frame line part in the figure is other noise signals which are included in the first audio signal and are not included in the second audio signal, namely, after the second noise signal in the first audio signal is removed, the obtained third audio signal also includes other noise signals.
Referring to fig. 5, fig. 5 is a flowchart of performing noise reduction processing on the first audio signal based on the first audio signal and the second audio signal to obtain a third audio signal. And filtering the first audio signal through a first adaptive filter to eliminate noise signals in the first audio signal and obtain a second voice signal. And removing the second voice signal from the second audio signal to obtain a purer second noise signal. Filtering the second noise signal to obtain a filtered second noise signal; and removing the second noise signal after the filtering processing from the first audio signal to obtain a third audio signal.
In the embodiment of the application, the first audio signal is subjected to noise reduction processing based on the first audio signal and the second audio signal, so that the second noise signal in the first audio signal is removed, the noise signal in the first audio signal is reduced, and the subsequent noise reduction processing of a third noise signal for removing the second noise signal is facilitated.
Step 203: a target noise reference signal is derived based on the second audio signal and one of the first audio signal and the third audio signal.
This step may be accomplished by:
Fusing one of the first audio signal and the third audio signal with the second audio signal to obtain a target noise reference signal; thus, the target noise reference signal includes both the speech signal and the noise signal in the first audio signal and the speech signal and the noise signal in the second audio signal.
In one possible implementation, the third audio signal and the second audio signal are fused to obtain the target noise reference signal. In the embodiment of the application, the third audio signal and the second audio signal are both audio signals comprising voice signals and noise signals, and because the third audio signal is the audio signal for removing part of the noise signals, the third audio signal and the second audio signal are fused to obtain the target noise reference signal, thereby facilitating subsequent rapid and efficient processing of the third audio signal.
In another possible implementation, the first audio signal and the second audio signal are fused to obtain the target noise reference signal. In the embodiment of the application, the first audio signal and the second audio signal are both audio signals comprising voice signals and noise signals, and the first audio signal and the second audio signal are directly fused to obtain the target noise reference signal, so that convenience and trouble are saved.
Wherein fusing the second audio signal and one of the first audio signal and the third audio signal may be achieved by the following steps (1) - (2):
(1) A target fusion coefficient is determined based on the first audio signal, the second audio signal, and the third audio signal.
This step can be achieved by the following steps A1-A3:
a1: and respectively determining a first spectrogram of the first audio signal and a second spectrogram of the second audio signal, wherein the abscissa of the first spectrogram and the second spectrogram is frequency, and the ordinate of the first spectrogram and the second spectrogram is decibel.
The method comprises the steps of carrying out Fourier transform on a first audio signal to generate a first spectrogram, and carrying out Fourier transform on a second audio signal to generate a second spectrogram.
In the embodiment of the application, the coherence between the first audio signal and the second audio signal can be simply and intuitively determined through the first spectrogram and the second spectrogram.
A2: and determining a coherent frequency band in the first spectrogram, wherein the coherent frequency band is a frequency band in which the decibel difference between the coherent frequency band and the second spectrogram in the first spectrogram is smaller than a preset decibel value.
It should be noted that, the preset db value may be set and changed according to the need, and in the embodiment of the present application, the preset db value is set to 5 db.
The first frequency spectrogram and the second frequency spectrogram both comprise a plurality of frequency bands, and the coherent frequency band is a frequency band with a decibel difference between the first frequency spectrogram and the second frequency spectrogram being smaller than a preset decibel value, so that the coherence of the first audio signal and the second audio signal corresponding to the determined coherent frequency band is high.
A3: a target fusion coefficient is determined based on the coherent frequency band, the first audio signal, and the third audio signal.
The method comprises the following two implementation modes:
The first implementation mode: determining that the minimum frequency of the coherent frequency band is not lower than a preset frequency; respectively determining first energy and second energy, wherein the first energy is the energy sum of the first audio signal in the coherent frequency band, and the second energy is the energy sum of the third audio signal in the coherent frequency band; the quotient of the second energy and the first energy is determined as a target fusion coefficient.
It should be noted that, the preset frequency may be set and changed according to the need, and in the embodiment of the present application, the preset frequency is set to 2000Hz.
Performing Fourier transform on the first audio signal in the coherent frequency band to generate a first power spectrum of the first audio signal in the coherent frequency band; and carrying out Fourier transform on the third audio signal in the coherent frequency band to generate a second power spectrum of the third audio signal in the coherent frequency band. Wherein the abscissa of the first power spectrum and the second power spectrum is frequency and the ordinate is power. The energy sum of the first audio signal in the coherent frequency band is the area value covered by the first power spectrum, and the energy sum of the second audio signal in the coherent frequency band is the area value covered by the second power spectrum.
In the embodiment of the application, the second energy is the energy sum of the third audio signal containing other noise signals in the coherent frequency band, and the first energy is the energy sum of the first audio signal containing the voice signal, the second noise signal and the other noise signals in the coherent frequency band, so that the quotient of the second energy and the first energy is taken as a target fusion coefficient, and the target fusion coefficient can represent the weight of the other noise signals in the first audio signal in the coherent frequency band; obviously, if the number of voice signals contained in the coherent frequency band is large, the second noise signals contained in the description is small, and the value of the target fusion coefficient is large; if the coherent frequency band contains more noise signals, the second noise signals are also more, and the value of the target fusion coefficient is smaller.
The second implementation mode: determining that the minimum frequency of the coherent frequency band is lower than a preset frequency; respectively determining first energy, second energy and third energy, wherein the first energy is the energy sum of the first audio signal in a coherent frequency band; the second energy is the energy sum of the third audio signal in the coherent frequency band, the third energy is the energy sum of the first audio signal in the full frequency band, and the full frequency band is the frequency band included by the first spectrogram. The quotient of the second energy and the first energy is determined as a first fusion coefficient. The quotient of the first energy and the third energy is determined as a second fusion coefficient. And determining the average value of the first fusion coefficient and the second fusion coefficient as a target fusion coefficient.
Performing Fourier transform on a first audio signal in a coherent frequency band to generate a first power spectrum of the first audio signal in the coherent frequency band; performing Fourier transform on the third audio signal in the coherent frequency band to generate a second power spectrum of the third audio signal in the coherent frequency band; and carrying out Fourier transform on the first audio signal in the full frequency band to generate a third power spectrum of the first audio signal in the full frequency band. Wherein, the abscissa of the first power spectrum, the second power spectrum and the third power spectrum is frequency, and the ordinate is power. The energy sum of the first audio signal in the coherent frequency band is the area value covered by the first power spectrum, the energy sum of the second audio signal in the coherent frequency band is the area value covered by the second power spectrum, and the energy sum of the third audio signal in the full frequency band is the area value covered by the third power spectrum.
In the embodiment of the application, the first energy is the energy sum of the first audio signal in the coherent frequency band, the third energy is the energy sum of the first audio signal in the full frequency band, and the coherence of the first audio signal and the second audio signal in the coherent frequency band is high, namely the second noise signal in the second audio signal is included; in this way, the quotient of the first energy and the third energy is taken as a second fusion coefficient, so that the second fusion coefficient can represent the weight of a first audio signal containing a second noise signal in a first audio signal of a full frequency band, and the first fusion coefficient represents the weight of other noise signals in the first audio signal of a coherent frequency band, and then the average value of the first fusion coefficient and the second fusion coefficient is taken as a target fusion coefficient, so that the target fusion coefficient represents the weight of all noise signals in the first audio signal; obviously, if the number of voice signals contained in the coherent frequency band is large, the description contains less noise signals, and the value of the target fusion coefficient is large; if the coherent frequency band contains more noise signals, the value of the target fusion coefficient is smaller.
(2) And fusing one of the first audio signal and the third audio signal with the second audio signal based on the target fusion coefficient to obtain a target noise reference signal.
This step is achieved by the following steps A1-A2:
A1: based on the target fusion coefficient, generating first relation data, wherein the first relation data is relation data with parameters being the target fusion coefficient, independent variables being one of the first audio signal and the third audio signal, the second audio signal and dependent variables being target noise reference signals.
The first relationship data is: n mew=α*Nold + (1-alpha) S
Wherein N new is a target noise reference signal, N old is a second audio signal, S is one audio signal of the first audio signal and the third audio signal, and α is a target fusion coefficient.
A2: and fusing one of the first audio signal and the third audio signal with the second audio signal through the first relation data to obtain a target noise reference signal.
Substituting one audio signal of the first audio signal and the third audio signal and the second audio signal into the first relation data to obtain a target noise reference signal.
It should be noted that, in the embodiment of the present application, the coherent frequency band of the first audio signal may be determined by VAD (Voice Activity Detection, silence suppression) processing. Referring to the first relationship data, if the VAD judges that there is a deviation, if the VAD judges that there is a speech segment, the target fusion coefficient is larger, the proportion of the reserved second audio signal is larger, and the damage to the speech signal of one of the first audio signal and the third audio signal is smaller. If the error judgment is the noise section, the target fusion coefficient is smaller, the proportion of one of the first audio signal and the third audio signal is kept large, and then the noise signal in one of the first audio signal and the third audio signal can be effectively filtered.
In the embodiment of the application, the target noise reference signal is obtained by fusing one of the first audio signal and the third audio signal with the second audio signal; thus, the target noise reference signal comprises the voice signal and the noise signal in the first audio signal and also comprises the voice signal and the noise signal in the second audio signal, so that when the noise reduction processing is carried out on the third audio signal based on the target noise reference signal, all the noise signals in the third audio signal can be effectively removed.
Step 204: and carrying out noise reduction processing on the third audio signal based on the target noise reference signal to obtain a target voice signal.
This step can be achieved by the following steps (1) - (3):
(1) A first speech signal in the third audio signal is determined.
This step can be achieved by the following steps A1-A2:
a1: from the third audio signal, a fourth audio signal is determined.
The fourth audio signal is a signal containing a noise signal in the third audio signal. And performing VAD processing on the third audio signal to obtain a fourth audio signal.
In the embodiment of the application, the long silence period can be identified and eliminated from the third audio signal by performing VAD processing on the third audio signal, so that the effect of saving speech channel resources under the condition of not reducing service quality is achieved. The third audio signal is subjected to VAD processing, and a signal including a noise signal in the third audio signal can be recognized, so that a fourth audio signal including a noise signal can be obtained by performing VAD processing on the third audio signal.
A2: and taking the voice signal in the fourth audio signal as the first voice signal.
The fourth audio signal includes not only a noise signal but also a speech signal. And filtering the fourth audio signal, and eliminating the noise signal in the fourth audio signal to obtain a voice signal in the fourth audio signal as the first voice signal. The fourth audio signal is filtered by the first adaptive filter.
(2) And removing the first voice signal in the target noise reference signal to obtain a first noise signal.
The target noise reference signal comprises a voice signal and a noise signal; in the embodiment of the application, after the first voice signal in the target noise reference signal is removed, a pure first noise signal which does not comprise the voice signal is obtained.
(3) And removing the first noise signal in the third audio signal to obtain a target voice signal.
Wherein the third audio signal includes a speech signal and a first noise signal. Filtering the first noise signal to obtain a filtered first noise signal; and removing the first noise signal after the filtering processing from the third audio signal to obtain a target voice signal. The first noise signal is filtered by the second adaptive filter.
Referring to fig. 6, fig. 6 is a flowchart of performing noise reduction processing on the third audio signal based on the target noise reference signal to obtain a target speech signal. And performing VAD processing on the third audio signal to obtain a fourth audio signal. And filtering the fourth audio signal, and eliminating the noise signal in the fourth audio signal to obtain the first voice signal. And removing the first voice signal to obtain a purer first noise signal. Filtering the first noise signal to obtain a filtered first noise signal; and removing the first noise signal after the filtering processing from the third audio signal to obtain a target voice signal.
In the embodiment of the application, the noise reduction processing is performed on the third audio signal based on the target noise reference signal, and because the target noise reference signal contains all noise signals in the third audio signal, the noise signals in the third audio signal can be removed completely through the target noise reference signal, thereby improving the purity of the target voice signal obtained by noise reduction. On the premise of ensuring the hearing feeling, the noise reduction method for the audio signal provided by the embodiment of the application eliminates noise residues caused by insufficient coherence of two audio signals on the premise of not damaging or less damaging voice, and improves the robustness of the noise reduction method for the audio noise reduction.
Step 205: and performing target operation on the target voice signal.
This step includes several implementations:
In a possible implementation manner, the method for noise reduction of an audio signal provided by the embodiment of the application is applied to a scene of voice control, and a target voice signal is sent to enable a communication opposite terminal to execute a command corresponding to the target voice signal.
In this implementation, the electronic device is a voice control terminal, and the communication peer may be a mobile phone, a robot, or the like. After the electronic equipment sends the target voice signal to the opposite communication terminal, the opposite communication terminal receives the target voice signal, so that the opposite communication terminal executes a command corresponding to the target voice signal, and further the application of the target voice signal obtained by the method provided by the embodiment of the application in voice control is realized.
In another possible implementation manner, the noise reduction method for an audio signal provided by the embodiment of the application is applied to a scene where audio is played, and a target voice signal is played.
In this implementation, the electronic device is a playback terminal, which may be a microphone, an earphone, or the like. After the target voice signal is played, the application of the target voice signal obtained by the method provided by the embodiment of the application on audio playing is realized.
In another possible implementation manner, the method for noise reduction of an audio signal provided by the embodiment of the application is applied to a scene of recording audio, and generates an audio file based on a target voice signal.
In this implementation, the electronic device is a recording terminal, which may be a recording pen, a recorder, a video camera, etc. Based on the target voice signal, an audio file is generated, and the application of the target voice signal obtained by the method provided by the embodiment of the application on audio recording is realized.
In the embodiment of the application, the target noise reference signal is obtained by fusing one of the first audio signal and the third audio signal with the second audio signal, so that the target noise reference signal comprises the noise signal in the second audio signal and the noise signal in the first audio signal which is not recorded by the second audio signal, and further the noise reduction processing is carried out on the third audio signal based on the target noise reference signal, so that the noise signal in the third audio signal can be effectively removed, and the purity of the target voice signal after noise reduction is improved.
An embodiment of the present application provides a noise reduction device for an audio signal, referring to fig. 7, the device includes:
The first processing module 701 is configured to perform noise reduction processing on the first audio signal based on the first audio signal and the second audio signal, so as to obtain a third audio signal, where the first audio signal and the second audio signal are audio signals with coherence; a determining module 702, configured to obtain a target noise reference signal based on one of the first audio signal and the third audio signal and the second audio signal; the second processing module 703 is configured to perform noise reduction processing on the third audio signal based on the target noise reference signal, so as to obtain a target speech signal.
In some possible embodiments, the first audio signal and the second audio signal are respectively audio signals with coherence recorded by the two audio acquisition components.
In some possible embodiments, the noise signal of the first audio signal is less than the noise signal of the second audio signal.
In one possible implementation, the determining module 702 includes: and the fusion unit is used for fusing one of the first audio signal and the third audio signal with the second audio signal to obtain a target noise reference signal.
In one possible implementation, the fusion unit includes: a determining subunit configured to determine a target fusion coefficient based on the first audio signal, the second audio signal, and the third audio signal; and the fusion subunit is used for fusing one of the first audio signal and the third audio signal with the second audio signal based on the target fusion coefficient to obtain a target noise reference signal.
In one possible implementation, the determining subunit is configured to: respectively determining a first spectrogram of a first audio signal and a second spectrogram of a second audio signal, wherein the abscissa of the first spectrogram and the second spectrogram is frequency, and the ordinate of the first spectrogram and the second spectrogram is decibel; determining a coherent frequency band in the first spectrogram, wherein the coherent frequency band is a frequency band with a decibel difference between the coherent frequency band and the second spectrogram being smaller than a preset decibel value; a target fusion coefficient is determined based on the coherent frequency band, the first audio signal, and the third audio signal.
In one possible implementation, the determining subunit is configured to: determining that the minimum frequency of the coherent frequency band is not lower than a preset frequency; respectively determining first energy and second energy, wherein the first energy is the energy sum of the first audio signal in the coherent frequency band, and the second energy is the energy sum of the third audio signal in the coherent frequency band; the quotient of the second energy and the first energy is determined as a target fusion coefficient.
In one possible implementation, the determining subunit is configured to: determining that the minimum frequency of the coherent frequency band is lower than a preset frequency; respectively determining first energy, second energy and third energy, wherein the first energy is the energy sum of the first audio signal in the coherent frequency band, the second energy is the energy sum of the third audio signal in the coherent frequency band, the third energy is the energy sum of the first audio signal in a full frequency band, and the full frequency band is a frequency band included by a first spectrogram; determining the quotient of the second energy and the first energy as a first fusion coefficient; determining the quotient of the first energy and the third energy as a second fusion coefficient; and determining the average value of the first fusion coefficient and the second fusion coefficient as a target fusion coefficient.
In one possible implementation, the fusion subunit is configured to: generating first relation data based on the target fusion coefficient, wherein the first relation data is relation data with parameters being the target fusion coefficient, independent variables being one audio signal and a second audio signal in the first audio signal and the third audio signal and dependent variables being target noise reference signals; and fusing one of the first audio signal and the third audio signal with the second audio signal through the first relation data to obtain a target noise reference signal.
In one possible implementation, the second processing module 703 includes: a determination unit configured to determine a first voice signal in the third audio signal; the first removing unit is used for removing a first voice signal in the target noise reference signal to obtain a first noise signal; and the second removing unit is used for removing the first noise signal in the third audio signal to obtain a target voice signal.
In a possible implementation, the determining unit is configured to: determining a fourth audio signal from the third audio signal, wherein the fourth audio signal is a signal containing a noise signal in the third audio signal; and taking the voice signal in the fourth audio signal as the first voice signal.
In one possible implementation, the first processing module 701 is configured to: determining a second speech signal in the first audio signal; removing a second voice signal in the second audio signal to obtain a second noise signal; and removing the second noise signal in the first audio signal to obtain a third audio signal.
In one possible implementation, the apparatus further includes: the sending module is used for sending the target voice signal and enabling the communication opposite terminal to execute a command corresponding to the target voice signal; the playing module is used for playing the target voice signal; and the generating module is used for generating an audio file based on the target voice signal.
Fig. 8 shows a block diagram of an electronic device 800 provided by an exemplary embodiment of the application. The electronic device 800 may be a portable mobile electronic device such as: a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III, motion picture expert compression standard audio plane 3), an MP4 (Moving Picture Experts Group Audio Layer IV, motion picture expert compression standard audio plane 4) player, a notebook computer, or a desktop computer. Electronic device 800 may also be referred to by other names as user device, portable electronic device, laptop electronic device, desktop electronic device, and the like.
Generally, the electronic device 800 includes: a processor 801 and a memory 802.
Processor 801 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and the like. The processor 801 may be implemented in at least one hardware form of DSP (DIGITAL SIGNAL Processing), FPGA (Field-Programmable gate array), PLA (Programmable Logic Array ). The processor 801 may also include a main processor, which is a processor for processing data in an awake state, also referred to as a CPU (Central Processing Unit ), and a coprocessor; a coprocessor is a low-power processor for processing data in a standby state. In some embodiments, the processor 801 may integrate a GPU (Graphics Processing Unit, image processor) for rendering and drawing of content required to be displayed by the display screen. In some embodiments, the processor 801 may also include an AI (ARTIFICIAL INTELLIGENCE ) processor for processing computing operations related to machine learning.
Memory 802 may include one or more computer-readable storage media, which may be non-transitory. Memory 802 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 802 is used to store at least one instruction for execution by processor 801 to implement the method of noise reduction of an audio signal provided by an embodiment of the method of the present application.
In some embodiments, the electronic device 800 may further optionally include: a peripheral interface 803, and at least one peripheral. The processor 801, the memory 802, and the peripheral interface 803 may be connected by a bus or signal line. Individual peripheral devices may be connected to the peripheral device interface 803 by buses, signal lines, or a circuit board. Specifically, the peripheral device includes: at least one of radio frequency circuitry 804, a display 805, a camera assembly 806, audio circuitry 807, a positioning assembly 808, and a power supply 809.
Peripheral interface 803 may be used to connect at least one Input/Output (I/O) related peripheral to processor 801 and memory 802. In some embodiments, processor 801, memory 802, and peripheral interface 803 are integrated on the same chip or circuit board; in some other embodiments, either or both of the processor 801, the memory 802, and the peripheral interface 803 may be implemented on separate chips or circuit boards, which is not limited in this embodiment.
The Radio Frequency circuit 804 is configured to receive and transmit RF (Radio Frequency) signals, also known as electromagnetic signals. The radio frequency circuit 804 communicates with a communication network and other communication devices via electromagnetic signals. The radio frequency circuit 804 converts an electrical signal into an electromagnetic signal for transmission, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 804 includes: antenna systems, RF transceivers, one or more amplifiers, tuners, oscillators, digital signal processors, codec chipsets, subscriber identity module cards, and so forth. The radio frequency circuitry 804 may communicate with other electronic devices via at least one wireless communication protocol. The wireless communication protocol includes, but is not limited to: the world wide web, metropolitan area networks, intranets, generation mobile communication networks (2G, 3G, 4G, and 5G), wireless local area networks, and/or WiFi (WIRELESS FIDELITY ) networks. In some embodiments, the radio frequency circuit 804 may further include NFC (NEAR FIELD Communication) related circuits, which is not limited by the present application.
The display 805 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display 805 is a touch display, the display 805 also has the ability to collect touch signals at or above the surface of the display 805. The touch signal may be input as a control signal to the processor 801 for processing. At this time, the display 805 may also be used to provide virtual buttons and/or virtual keyboards, also referred to as soft buttons and/or soft keyboards. In some embodiments, the display 805 may be one and disposed on a front panel of the electronic device 800; in other embodiments, the display 805 may be at least two, respectively disposed on different surfaces of the electronic device 800 or in a folded design; in other embodiments, the display 805 may be a flexible display disposed on a curved surface or a folded surface of the electronic device 800. Even more, the display 805 may be arranged in an irregular pattern other than rectangular, i.e., a shaped screen. The display 805 may be made of LCD (Liquid CRYSTAL DISPLAY), OLED (Organic Light-Emitting Diode), or other materials.
The camera assembly 806 is used to capture images or video. Optionally, the camera assembly 806 includes a front camera and a rear camera. In general, a front camera is disposed on a front panel of an electronic device, and a rear camera is disposed on a rear surface of the electronic device. In some embodiments, the at least two rear cameras are any one of a main camera, a depth camera, a wide-angle camera and a tele camera, so as to realize that the main camera and the depth camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize a panoramic shooting and Virtual Reality (VR) shooting function or other fusion shooting functions. In some embodiments, the camera assembly 806 may also include a flash. The flash lamp can be a single-color temperature flash lamp or a double-color temperature flash lamp. The dual-color temperature flash lamp refers to a combination of a warm light flash lamp and a cold light flash lamp, and can be used for light compensation under different color temperatures.
Audio circuitry 807 may include a microphone and a speaker. The microphone is used for collecting sound waves of users and the environment, converting the sound waves into electric signals, inputting the electric signals to the processor 801 for processing, or inputting the electric signals to the radio frequency circuit 804 for voice communication. For purposes of stereo acquisition or noise reduction, the microphone may be multiple and separately disposed at different locations of the electronic device 800. The microphone may also be an array microphone or an omni-directional pickup microphone. The speaker is used to convert electrical signals from the processor 801 or the radio frequency circuit 804 into sound waves. The speaker may be a conventional thin film speaker or a piezoelectric ceramic speaker. When the speaker is a piezoelectric ceramic speaker, not only the electric signal can be converted into a sound wave audible to humans, but also the electric signal can be converted into a sound wave inaudible to humans for ranging and other purposes. In some embodiments, audio circuit 807 may also include a headphone jack.
The location component 808 is utilized to locate the current geographic location of the electronic device 800 for navigation or LBS (Location Based Service, location-based services). The positioning component 808 may be a positioning component based on the United states GPS (Global Positioning System ), the Beidou system of China, or the Galileo system of Russia.
The power supply 809 is used to power the various components in the electronic device 800. The power supply 809 may be an alternating current, direct current, disposable battery, or rechargeable battery. When the power supply 809 includes a rechargeable battery, the rechargeable battery may be a wired rechargeable battery or a wireless rechargeable battery. The wired rechargeable battery is a battery charged through a wired line, and the wireless rechargeable battery is a battery charged through a wireless coil. The rechargeable battery may also be used to support fast charge technology.
In some embodiments, the electronic device 800 also includes one or more sensors 810. The one or more sensors 810 include, but are not limited to: acceleration sensor 811, gyroscope sensor 812, pressure sensor 813, fingerprint sensor 814, optical sensor 815, and proximity sensor 816.
The acceleration sensor 811 can detect the magnitudes of accelerations on three coordinate axes of the coordinate system established with the electronic device 800. For example, the acceleration sensor 811 may be used to detect components of gravitational acceleration in three coordinate axes. The processor 801 may control the display screen 805 to display a user interface in a landscape view or a portrait view based on the gravitational acceleration signal acquired by the acceleration sensor 811. Acceleration sensor 811 may also be used for the acquisition of motion data of a game or user.
The gyro sensor 812 may detect a body direction and a rotation angle of the electronic device 800, and the gyro sensor 812 may collect a 3D motion of the user on the electronic device 800 in cooperation with the acceleration sensor 811. The processor 801 may implement the following functions based on the data collected by the gyro sensor 812: motion sensing (e.g., changing UI according to a tilting operation by a user), image stabilization at shooting, game control, and inertial navigation.
The pressure sensor 813 may be disposed at a side frame of the electronic device 800 and/or at an underlying layer of the display 805. When the pressure sensor 813 is disposed on a side frame of the electronic device 800, a grip signal of the electronic device 800 by a user may be detected, and the processor 801 performs left-right hand recognition or shortcut operation according to the grip signal collected by the pressure sensor 813. When the pressure sensor 813 is disposed at the lower layer of the display screen 805, the processor 801 controls the operability control on the UI interface according to the pressure operation of the user on the display screen 805. The operability controls include at least one of a button control, a scroll bar control, an icon control, and a menu control.
The fingerprint sensor 814 is used to collect a fingerprint of a user, and the processor 801 identifies the identity of the user based on the fingerprint collected by the fingerprint sensor 814, or the fingerprint sensor 814 identifies the identity of the user based on the collected fingerprint. Upon recognizing that the user's identity is a trusted identity, the processor 801 authorizes the user to perform relevant sensitive operations including unlocking the screen, viewing encrypted information, downloading software, paying for and changing settings, etc. The fingerprint sensor 814 may be disposed on the front, back, or side of the electronic device 800. When a physical key or vendor Logo is provided on the electronic device 800, the fingerprint sensor 814 may be integrated with the physical key or vendor Logo.
The optical sensor 815 is used to collect the ambient light intensity. In one embodiment, the processor 801 may control the display brightness of the display screen 805 based on the intensity of ambient light collected by the optical sensor 815. Specifically, when the intensity of the ambient light is high, the display brightness of the display screen 805 is turned up; when the ambient light intensity is low, the display brightness of the display screen 805 is turned down. In another embodiment, the processor 801 may also dynamically adjust the shooting parameters of the camera module 806 based on the ambient light intensity collected by the optical sensor 815.
A proximity sensor 816, also referred to as a distance sensor, is typically provided on the front panel of the electronic device 800. The proximity sensor 816 is used to collect the distance between the user and the front of the electronic device 800. In one embodiment, when the proximity sensor 816 detects that the distance between the user and the front of the electronic device 800 gradually decreases, the processor 801 controls the display 805 to switch from the bright screen state to the off screen state; when the proximity sensor 816 detects that the distance between the user and the front surface of the electronic device 800 gradually increases, the processor 801 controls the display 805 to switch from the off-screen state to the on-screen state.
Those skilled in the art will appreciate that the structure shown in fig. 8 is not limiting and that more or fewer components than shown may be included or certain components may be combined or a different arrangement of components may be employed.
An embodiment of the present application provides a computer-readable storage medium having at least one program code stored therein, the at least one program code being loaded and executed by a processor to implement operations performed by a noise reduction method for an audio signal as described above.
Embodiments of the present application provide a computer program product or a computer program comprising computer program code stored in a computer readable storage medium. The processor of the electronic device reads the computer program code from the computer readable storage medium and executes the computer program code such that the electronic device performs the operations performed by the above-described noise reduction method of an audio signal.
In some embodiments, a computer program according to an embodiment of the present application may be deployed to be executed on one electronic device, or on a plurality of electronic devices located at one site, or on a plurality of electronic devices distributed at a plurality of sites and interconnected by a communication network, where the plurality of electronic devices distributed at the plurality of sites and interconnected by the communication network may constitute a blockchain system.
In the embodiment of the application, the target noise reference signal is obtained based on one of the first audio signal and the third audio signal and the second audio signal, so that the target noise reference signal comprises the noise signal in the second audio signal and the noise signal in the first audio signal which is not recorded by the second audio signal, and further the noise reduction processing is carried out on the third audio signal based on the target noise reference signal, so that the noise signal in the third audio signal can be effectively removed, and the purity of the target voice signal after noise reduction is improved.
The foregoing description of the preferred embodiments of the present application is not intended to limit the application, but rather, the application is to be construed as limited to the appended claims.

Claims (11)

1. A method of noise reduction of an audio signal, comprising:
Determining a second speech signal in the first audio signal; removing the second voice signal in the second audio signal to obtain a second noise signal; removing the second noise signal in the first audio signal to obtain a third audio signal, wherein the first audio signal and the second audio signal are audio signals with coherence, the first audio signal and the second audio signal are acquired based on two audio acquisition components of the same electronic device, the energy of the voice signal in the first audio signal is larger than the energy of the voice signal in the second audio signal, the energy of the noise signal in the second audio signal is larger than the energy of the noise signal in the first audio signal, and the third audio signal comprises the voice signal in the first audio signal and comprises other noise signals except the second noise signal in the second audio signal;
Determining a target fusion coefficient based on the first audio signal, the second audio signal, and the third audio signal; fusing one of the first audio signal and the third audio signal with the second audio signal based on the target fusion coefficient to obtain a target noise reference signal, wherein the target noise reference signal comprises a voice signal and a noise signal in the first audio signal and comprises a voice signal and a noise signal in the second audio signal;
and carrying out noise reduction processing on the third audio signal based on the target noise reference signal so as to remove other noise signals in the third audio signal and obtain a target voice signal.
2. The method of noise reduction of an audio signal according to claim 1, wherein the determining a target fusion coefficient based on the first audio signal, the second audio signal, and the third audio signal comprises:
Respectively determining a first spectrogram of the first audio signal and a second spectrogram of the second audio signal, wherein the abscissa of the first spectrogram and the second spectrogram is frequency, and the ordinate of the first spectrogram and the second spectrogram is decibel;
Determining a coherent frequency band in the first spectrogram, wherein the coherent frequency band is a frequency band in which the decibel difference between the coherent frequency band and the second spectrogram in the first spectrogram is smaller than a preset decibel value;
The target fusion coefficient is determined based on the coherent frequency band, the first audio signal, and the third audio signal.
3. The method of noise reduction of an audio signal according to claim 2, wherein said determining the target fusion coefficient based on the coherent frequency band, the first audio signal and the third audio signal comprises:
determining that the minimum frequency of the coherent frequency band is not lower than a preset frequency;
Respectively determining first energy and second energy, wherein the first energy is the energy sum of the first audio signal in the coherent frequency band, and the second energy is the energy sum of the third audio signal in the coherent frequency band;
determining a quotient of the second energy and the first energy as the target fusion coefficient.
4. The method of noise reduction of an audio signal according to claim 2, wherein said determining the target fusion coefficient based on the coherent frequency band, the first audio signal and the third audio signal comprises:
determining that the minimum frequency of the coherent frequency band is lower than a preset frequency;
respectively determining first energy, second energy and third energy, wherein the first energy is the energy sum of the first audio signal in the coherent frequency band, the second energy is the energy sum of the third audio signal in the coherent frequency band, the third energy is the energy sum of the first audio signal in a full frequency band, and the full frequency band is a frequency band included by the first spectrogram;
Determining a quotient of the second energy and the first energy as a first fusion coefficient;
Determining a quotient of the first energy and the third energy as a second fusion coefficient;
and determining the average value of the first fusion coefficient and the second fusion coefficient as the target fusion coefficient.
5. The method for noise reduction of an audio signal according to claim 1, wherein the fusing one of the first audio signal and the third audio signal with the second audio signal based on the target fusion coefficient to obtain a target noise reference signal comprises:
Generating first relationship data based on the target fusion coefficient; the first relation data are relation data with parameters of the target fusion coefficient, independent variables of the first audio signal and the third audio signal, one audio signal of the first audio signal and the third audio signal, the second audio signal and the dependent variables of the target noise reference signal;
and fusing one of the first audio signal and the third audio signal with the second audio signal through the first relation data to obtain the target noise reference signal.
6. The method for noise reduction of an audio signal according to any one of claims 1 to 5, wherein the noise reduction processing is performed on the third audio signal based on the target noise reference signal to remove other noise signals in the third audio signal, so as to obtain a target speech signal, and the method comprises:
determining a first speech signal in the third audio signal;
Removing the first voice signal in the target noise reference signal to obtain a first noise signal;
And removing the first noise signal in the third audio signal to obtain the target voice signal.
7. The method of noise reduction of an audio signal according to claim 6, wherein said determining a first speech signal in the third audio signal comprises:
determining a fourth audio signal from the third audio signal, wherein the fourth audio signal is a signal containing a noise signal in the third audio signal;
And taking the voice signal in the fourth audio signal as the first voice signal.
8. The method of noise reduction of an audio signal according to claim 1, further comprising:
Transmitting the target voice signal to enable the communication opposite terminal to execute a command corresponding to the target voice signal; or alternatively
Playing the target voice signal; or alternatively
An audio file is generated based on the target speech signal.
9. A noise reduction device for an audio signal, the device comprising:
A first processing module for determining a second speech signal in the first audio signal; removing the second voice signal in the second audio signal to obtain a second noise signal; removing the second noise signal in the first audio signal to obtain a third audio signal, wherein the first audio signal and the second audio signal are audio signals with coherence, the first audio signal and the second audio signal are acquired based on two audio acquisition components of the same electronic device, the energy of the voice signal in the first audio signal is larger than the energy of the voice signal in the second audio signal, the energy of the noise signal in the second audio signal is larger than the energy of the noise signal in the first audio signal, and the third audio signal comprises the voice signal in the first audio signal and comprises other noise signals except the second noise signal in the second audio signal;
A determining module configured to determine a target fusion coefficient based on the first audio signal, the second audio signal, and the third audio signal; fusing one of the first audio signal and the third audio signal with the second audio signal based on the target fusion coefficient to obtain a target noise reference signal, wherein the target noise reference signal comprises a voice signal and a noise signal in the first audio signal and comprises a voice signal and a noise signal in the second audio signal;
And the second processing module is used for carrying out noise reduction processing on the third audio signal based on the target noise reference signal so as to remove other noise signals in the third audio signal and obtain a target voice signal.
10. An electronic device comprising one or more processors and one or more memories, the one or more memories having stored therein at least one program code loaded and executed by the one or more processors to perform the operations performed by the method of noise reduction of an audio signal as claimed in any of claims 1 to 8.
11. A computer readable storage medium having stored therein at least one program code, the at least one program code being loaded and executed by a processor to implement the operations performed by the method of noise reduction of an audio signal as claimed in any one of claims 1 to 8.
CN202110777962.8A 2021-07-09 2021-07-09 Noise reduction method and device for audio signal, electronic equipment and storage medium Active CN113539291B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110777962.8A CN113539291B (en) 2021-07-09 2021-07-09 Noise reduction method and device for audio signal, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110777962.8A CN113539291B (en) 2021-07-09 2021-07-09 Noise reduction method and device for audio signal, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN113539291A CN113539291A (en) 2021-10-22
CN113539291B true CN113539291B (en) 2024-06-25

Family

ID=78098184

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110777962.8A Active CN113539291B (en) 2021-07-09 2021-07-09 Noise reduction method and device for audio signal, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113539291B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111402913A (en) * 2020-02-24 2020-07-10 北京声智科技有限公司 Noise reduction method, device, equipment and storage medium

Family Cites Families (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5374845B2 (en) * 2007-07-25 2013-12-25 日本電気株式会社 Noise estimation apparatus and method, and program
CN102347027A (en) * 2011-07-07 2012-02-08 瑞声声学科技(深圳)有限公司 Double-microphone speech enhancer and speech enhancement method thereof
US9685171B1 (en) * 2012-11-20 2017-06-20 Amazon Technologies, Inc. Multiple-stage adaptive filtering of audio signals
CN104754430A (en) * 2013-12-30 2015-07-01 重庆重邮信科通信技术有限公司 Noise reduction device and method for terminal microphone
US9978388B2 (en) * 2014-09-12 2018-05-22 Knowles Electronics, Llc Systems and methods for restoration of speech components
US9886966B2 (en) * 2014-11-07 2018-02-06 Apple Inc. System and method for improving noise suppression using logistic function and a suppression target value for automatic speech recognition
CN105225672B (en) * 2015-08-21 2019-02-22 胡旻波 Merge the system and method for the dual microphone orientation noise suppression of fundamental frequency information
CN107945815B (en) * 2017-11-27 2021-09-07 歌尔科技有限公司 Voice signal noise reduction method and device
WO2019112468A1 (en) * 2017-12-08 2019-06-13 Huawei Technologies Co., Ltd. Multi-microphone noise reduction method, apparatus and terminal device
CN109767783B (en) * 2019-02-15 2021-02-02 深圳市汇顶科技股份有限公司 Voice enhancement method, device, equipment and storage medium
US11011182B2 (en) * 2019-03-25 2021-05-18 Nxp B.V. Audio processing system for speech enhancement
CN110010143B (en) * 2019-04-19 2020-06-09 出门问问信息科技有限公司 Voice signal enhancement system, method and storage medium
TWI738532B (en) * 2019-10-27 2021-09-01 英屬開曼群島商意騰科技股份有限公司 Apparatus and method for multiple-microphone speech enhancement
CN110853664B (en) * 2019-11-22 2022-05-06 北京小米移动软件有限公司 Method and device for evaluating performance of speech enhancement algorithm and electronic equipment
CN110856072B (en) * 2019-12-04 2021-03-19 北京声加科技有限公司 Earphone conversation noise reduction method and earphone
CN111063366A (en) * 2019-12-26 2020-04-24 紫光展锐(重庆)科技有限公司 Method and device for reducing noise, electronic equipment and readable storage medium
CN111402918B (en) * 2020-03-20 2023-08-08 北京达佳互联信息技术有限公司 Audio processing method, device, equipment and storage medium
CN111986691B (en) * 2020-09-04 2024-02-02 腾讯科技(深圳)有限公司 Audio processing method, device, computer equipment and storage medium
CN112242149B (en) * 2020-12-03 2021-03-26 北京声智科技有限公司 Audio data processing method and device, earphone and computer readable storage medium
CN112735461B (en) * 2020-12-29 2024-06-07 西安讯飞超脑信息科技有限公司 Pickup method, and related device and equipment
CN112863535B (en) * 2021-01-05 2022-04-26 中国科学院声学研究所 Residual echo and noise elimination method and device
CN113035167A (en) * 2021-01-28 2021-06-25 广州朗国电子科技有限公司 Audio frequency tuning method and storage medium for active noise reduction

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111402913A (en) * 2020-02-24 2020-07-10 北京声智科技有限公司 Noise reduction method, device, equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于麦克风阵列的语音增强算法研究;闫姝等;《自动化仪表》;第40卷(第9期);第59-62页 *

Also Published As

Publication number Publication date
CN113539291A (en) 2021-10-22

Similar Documents

Publication Publication Date Title
CN110764730B (en) Method and device for playing audio data
CN108401124B (en) Video recording method and device
CN111402913B (en) Noise reduction method, device, equipment and storage medium
CN108965757B (en) Video recording method, device, terminal and storage medium
CN108922506A (en) Song audio generation method, device and computer readable storage medium
CN111276122B (en) Audio generation method and device and storage medium
CN111681655A (en) Voice control method and device, electronic equipment and storage medium
CN109102811B (en) Audio fingerprint generation method and device and storage medium
CN110797042B (en) Audio processing method, device and storage medium
CN110868642B (en) Video playing method, device and storage medium
CN110473562B (en) Audio data processing method, device and system
CN114384466A (en) Sound source direction determining method, sound source direction determining device, electronic equipment and storage medium
CN110992954A (en) Method, device, equipment and storage medium for voice recognition
CN109448676B (en) Audio processing method, device and storage medium
CN113539291B (en) Noise reduction method and device for audio signal, electronic equipment and storage medium
CN111711841B (en) Image frame playing method, device, terminal and storage medium
CN111798863B (en) Method and device for eliminating echo, electronic equipment and readable storage medium
CN114388001A (en) Multimedia file playing method, device, equipment and storage medium
CN110660031B (en) Image sharpening method and device and storage medium
CN114594885A (en) Application icon management method, device and equipment and computer readable storage medium
CN111063372B (en) Method, device and equipment for determining pitch characteristics and storage medium
CN114595019A (en) Theme setting method, device and equipment of application program and storage medium
CN108877845B (en) Song playing method and device
CN108881715B (en) Starting method and device of shooting mode, terminal and storage medium
CN113592874A (en) Image display method and device and computer equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant