CN112002339B

CN112002339B - Speech noise reduction method and device, computer-readable storage medium and electronic device

Info

Publication number: CN112002339B
Application number: CN202010713823.4A
Authority: CN
Inventors: 马路; 赵培; 苏腾荣
Original assignee: Haier Uplus Intelligent Technology Beijing Co Ltd
Current assignee: Haier Uplus Intelligent Technology Beijing Co Ltd
Priority date: 2020-07-22
Filing date: 2020-07-22
Publication date: 2024-01-26
Anticipated expiration: 2040-07-22
Also published as: CN112002339A

Abstract

The invention discloses a voice noise reduction method and device, a computer-readable storage medium and an electronic device. Wherein the method comprises the following steps: the method comprises the steps of performing voice separation on voice data to be subjected to noise reduction to obtain first voice data and second voice data of the voice data, and determining a first probability function corresponding to the first voice data and a second probability function corresponding to the second voice data; determining target noise reduction data of the voice data through the first probability function and the second probability function; the noise reduction data are used for carrying out noise reduction processing on the voice data to obtain target voice data, the purpose of separating the voice data subjected to noise reduction into two branches, namely first voice data and second voice data, and then the noise data separated from the two branches are used for carrying out noise reduction on the voice data mixed with noise is achieved, and therefore the technical problem that in the prior art, the accuracy of noise reduction is low is solved.

Description

Speech noise reduction method and device, computer-readable storage medium and electronic device

Technical Field

The present invention relates to the field of speech processing, and in particular, to a method and apparatus for noise reduction in speech, a computer-readable storage medium, and an electronic apparatus.

Background

The voice signal processing technology is a key technology in the field of man-machine interaction at present, and voice noise reduction can realize enhancement of input voice to obtain purer audio, has extremely important function on voice recognition at the rear end, and is a key technology for voice signal processing.

The current voice noise reduction method mainly adopts a noise reduction method in an open source tool WebRTC, namely: calculating spectral flatness, log Likelihood Ratio (LRT) characteristics and spectral difference characteristics of input audio, updating a probability function of voice/noise according to the characteristics, updating noise estimation according to the probability function, obtaining a wiener filter according to the noise estimation, and realizing noise reduction of the input audio by using the wiener filter. The method directly carries out the estimation of noise and signals in the current input noisy signals, so that the signal components can certainly influence the accurate estimation of the noise when the noise is estimated, and the estimation of the noise can also certainly influence the estimation of the noise, thereby influencing the final noise reduction effect.

In view of the above problems, no effective solution has been proposed at present.

Disclosure of Invention

The embodiment of the invention provides a voice noise reduction method and device, a computer-readable storage medium and an electronic device, which are used for at least solving the technical problem of low accuracy of voice noise reduction in the prior art.

According to an aspect of an embodiment of the present invention, there is provided a voice noise reduction method, including: performing voice separation on voice data to be subjected to noise reduction to obtain first voice data and second voice data of the voice data, wherein the proportion of voice signals in the first voice data is larger than a first threshold value, and the proportion of noise signals in the second voice data is larger than voice data of a second threshold value; performing time-frequency transformation on the first voice data and the second voice data respectively, and determining a first probability function corresponding to the first voice data and a second probability function corresponding to the second voice data; determining target noise reduction data of the voice data through the first probability function and the second probability function; and carrying out noise reduction processing on the voice data through the noise reduction data to obtain target voice data.

According to another aspect of the embodiment of the present invention, there is also provided a voice noise reduction apparatus, including: the separation unit is used for carrying out voice separation on voice data to be subjected to noise reduction to obtain first voice data and second voice data of the voice data, wherein the proportion of the voice data in the first voice data is larger than a first threshold value, and the proportion of the noise data in the second voice data is larger than a second threshold value; the first determining unit is used for performing time-frequency transformation on the first voice data to determine a corresponding first probability function and performing time-frequency transformation on the second voice data to determine a corresponding second probability function; a second determining unit configured to determine target noise reduction data of the speech data by the first probability function and the second probability function; the noise reduction unit is used for carrying out noise reduction processing on the voice data through the target noise reduction data to obtain target voice data.

According to a further aspect of embodiments of the present invention, there is also provided a computer-readable storage medium having a computer program stored therein, wherein the computer program is configured to perform the above-described speech noise reduction method when run.

According to still another aspect of the embodiments of the present invention, there is further provided an electronic device including a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor executes the above-mentioned voice noise reduction method through the computer program.

In the embodiment of the invention, through voice separation of voice data to be subjected to noise reduction, first voice data and second voice data of the voice data are obtained, wherein the proportion of the voice data in the first voice data is larger than a first threshold value, and the proportion of the noise data in the second voice data is larger than a second threshold value; performing time-frequency transformation on the first voice data to determine a corresponding first probability function, and performing time-frequency transformation on the second voice data to determine a corresponding second probability function; determining target noise reduction data of the voice data through the first probability function and the second probability function; the target noise reduction data is used for carrying out noise reduction processing on the voice data to obtain target voice data, the purpose of separating the voice data subjected to noise reduction into two branches, namely first voice data and second voice data, and then carrying out noise reduction on the voice data mixed with noise by utilizing the noise data separated from the two branches is achieved, so that the technical effect of carrying out noise reduction processing on the voice according to the noise data in the voice data is achieved, and the technical problem that in the prior art, the accuracy of noise reduction is low is solved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiments of the invention and together with the description serve to explain the invention and do not constitute a limitation on the invention. In the drawings:

FIG. 1 is a schematic illustration of an application environment of an alternative speech noise reduction method according to an embodiment of the invention;

FIG. 2 is a flow chart of an alternative method of speech noise reduction according to an embodiment of the invention;

FIG. 3 is a flow chart of an alternative method of speech noise reduction according to an embodiment of the invention;

FIG. 4 is a schematic diagram of an alternative speech noise reduction device according to an embodiment of the invention;

fig. 5 is a schematic structural diagram of an electronic device with an alternative voice noise reduction method according to an embodiment of the present invention.

Detailed Description

In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

According to an aspect of the embodiment of the present invention, a method for voice noise reduction is provided, optionally, as an optional implementation manner, the method for voice noise reduction may be applied, but is not limited to, in a hardware environment as shown in fig. 1, where the environment may include, but is not limited to, the user equipment 102, the network 110, and the server 112.

Wherein, the user equipment 102 may include, but is not limited to: a display 104, a processor 106, and a memory 108. The voice noise reduction 104 is used for acquiring voice data to be noise reduced through a man-machine interaction interface; the processor 106 is configured to respond to the man-machine interaction instruction, and separate the voice data to be denoised to obtain first voice data and second voice data, where the ratio of voice data included in the first voice data is greater than a first threshold value, and the ratio of noise data included in the second voice data is greater than a second threshold value. The memory 108 is used for storing information such as voice data to be noise reduced, the first voice data, and the second voice data. The server here may include, but is not limited to: the processing engine 116 is used for calling the first voice data and the second voice data stored in the database 114, performing time-frequency transformation on the first voice data to determine a corresponding first probability function, and performing time-frequency transformation on the second voice data to determine a corresponding second probability function; determining target noise reduction data of the voice data through the first probability function and the second probability function; the target noise reduction data is used for carrying out noise reduction processing on the voice data to obtain target voice data, the purpose of separating the voice data subjected to noise reduction into two branches, namely first voice data and second voice data, and then carrying out noise reduction on the voice data mixed with noise by utilizing the noise data separated from the two branches is achieved, so that the technical effect of carrying out noise reduction processing on the voice according to the noise data in the voice data is achieved, and the technical problem that in the prior art, the accuracy of noise reduction is low is solved.

The specific process comprises the following steps: in the terminal device 102. In steps S102-S110, the voice data to be noise reduced is separated to obtain first voice data and second voice data, and the first voice data and the second voice data are sent to the server 112 through the network 110. Performing time-frequency transformation on the first voice data at the server 112 to determine a corresponding first probability function, and performing time-frequency transformation on the second voice data to determine a corresponding second probability function; determining target noise reduction data of the voice data through the first probability function and the second probability function; and carrying out noise reduction processing on the voice data through the target noise reduction data to obtain target voice data. And then returns the result of the above determination to the terminal device 102.

Then, as shown in steps S114-S116, the terminal device 102 performs voice separation on the voice data to be noise reduced to obtain first voice data and second voice data of the voice data, where the voice data in the first voice data is greater than a first threshold value, and the noise data included in the second voice data is greater than a second threshold value; performing time-frequency transformation on the first voice data to determine a corresponding first probability function, and performing time-frequency transformation on the second voice data to determine a corresponding second probability function; determining target noise reduction data of the voice data through the first probability function and the second probability function; the target noise reduction data is used for carrying out noise reduction processing on the voice data to obtain target voice data, the purpose of separating the voice data subjected to noise reduction into two branches, namely first voice data and second voice data, and then carrying out noise reduction on the voice data mixed with noise by utilizing the noise data separated from the two branches is achieved, so that the technical effect of carrying out noise reduction processing on the voice according to the noise data in the voice data is achieved, and the technical problem that in the prior art, the accuracy of noise reduction is low is solved.

Alternatively, in this embodiment, the above-mentioned voice noise reduction method may be, but not limited to, applied to the server 112, for assisting the noise reduction processing of the voice data acquired by the application client. The application client may be, but not limited to, running in the user device 102, and the user device 102 may be, but not limited to, a terminal device supporting running of the application client, such as a mobile phone, a tablet computer, a notebook computer, a PC, etc. The server 112 and the user device 102 may implement data interactions over, but are not limited to, a network, which may include, but is not limited to, a wireless network or a wired network. Wherein the wireless network comprises: bluetooth, WIFI, and other networks that enable wireless communications. The wired network may include, but is not limited to: wide area network, metropolitan area network, local area network. The above is merely an example, and is not limited in any way in the present embodiment.

Optionally, as an optional embodiment, as shown in fig. 2, the voice noise reduction method includes:

step S202, performing voice separation on voice data to be noise reduced to obtain first voice data and second voice data of the voice data, wherein the voice data proportion in the first voice data is larger than a first threshold value, and the noise data proportion included in the second voice data is larger than the voice data of a second threshold value.

Step S204, performing time-frequency transformation on the first voice data to determine a corresponding first probability function, and performing time-frequency transformation on the second voice data to determine a corresponding second probability function.

In step S206, the target noise reduction data of the voice data is determined by the first probability function and the second probability function.

Step S208, noise reduction processing is carried out on the voice data through the target noise reduction data, and the target voice data is obtained.

Alternatively, in this embodiment, the voice data to be noise reduced may include, but is not limited to, voice data including noise data emitted by a human being, and voice including noise data emitted by an animal. That is, the above-mentioned voice data to be noise-reduced is voice which is required to remove noise in a sound source.

In this embodiment, noise reduction of voice requires dividing the voice data into two branches. Namely, sound source separation, and the input noisy speech is separated into a speech branch (corresponding to the first speech data) and a noise branch (corresponding to the second speech data) by the sound source separation module. Wherein, the voice branch circuit, the voice signal is the main component, with a small amount of noise; noise branches, the main component of noise, carry a small amount of speech signals.

It should be noted that after performing time-frequency conversion on the first voice data and the second voice data, the method may further include:

s1, calculating a first characteristic parameter in first voice data, wherein the first characteristic parameter comprises a spectrum flatness characteristic parameter, a log likelihood characteristic parameter and a spectrum difference characteristic parameter;

s2, calculating second characteristic parameters in the second voice data, wherein the second characteristic parameters comprise spectrum flatness characteristic parameters, log likelihood characteristic parameters and spectrum difference characteristic parameters;

s3, determining a first probability function and a second probability function according to the first characteristic parameter and the second characteristic parameter.

In practical applications, after the voice is separated, the first voice data and the second voice data are respectively subjected to time-frequency conversion, that is, the first voice data and the second voice data are respectively converted from time-frequency to frequency domain. The spectral flatness characteristic, log likelihood ratio characteristic, and spectral difference characteristic of the first voice data and the second voice data are calculated, respectively, and the probability function (i.e., first probability function/second probability function) of voice (first voice data)/noise (second voice data) is updated based on the three characteristics. The first voice data or the second voice data can perform voice activity detection according to the probability function, and further judge whether the voice data is noise information or voice signals. For example, the first voice data is judged as voice information.

Optionally, in this embodiment, determining the target noise reduction data of the voice data through the first probability function and the second probability function may include:

s1, determining first noise data of first voice data according to a first probability function;

s2, under the condition that voice activity detection is carried out according to the first probability function to determine that the first voice data are voice data, determining the first noise data as target noise data;

s3, determining second noise data of second voice data according to a second probability function;

and S4, when the voice activity detection is carried out according to the first probability function and the second voice data is determined to be the noise data, the second noise data is determined to be the target noise data.

Optionally, in this embodiment, before determining the first noise data of the first voice data according to the first probability function, the method further includes:

determining the first voice data as voice data under the condition that the first voice data probability value determined according to the first probability function is larger than a threshold value;

and determining the first voice data as noise data under the condition that the probability value of the first voice data determined according to the first probability function is smaller than a threshold value.

According to the embodiment provided by the application, voice separation is carried out on voice data to be subjected to noise reduction, so that first voice data and second voice data of the voice data are obtained, wherein the voice data in the first voice data are larger than a first threshold value, and the noise data included in the second voice data are larger than voice data of a second threshold value; performing time-frequency transformation on the first voice data to determine a corresponding first probability function, and performing time-frequency transformation on the second voice data to determine a corresponding second probability function; determining target noise reduction data of the voice data through the first probability function and the second probability function; the target noise reduction data is used for carrying out noise reduction processing on the voice data to obtain target voice data, the purpose of separating the voice data subjected to noise reduction into two branches, namely first voice data and second voice data, and then carrying out noise reduction on the voice data mixed with noise by utilizing the noise data separated from the two branches is achieved, so that the technical effect of carrying out noise reduction processing on the voice according to the noise data in the voice data is achieved, and the technical problem that in the prior art, the accuracy of noise reduction is low is solved.

As an optional embodiment, after performing noise reduction processing on the voice data through the noise reduction data to obtain the target voice data, the method may further include:

and transforming the target voice data from the frequency domain to the time domain by utilizing short-time Fourier transformation to obtain reconstructed target voice data.

Optionally, in this embodiment, the target voice data is transformed from the frequency domain to the time domain to obtain reconstructed target voice data, and voice recognition is performed on the reconstructed target voice data, so that the problem of low voice recognition rate caused by poor noise reduction performance of current voice is solved.

As an alternative embodiment, the present application also provides a method for voice noise reduction based on sound source separation.

In order to improve the estimation of the signal and the noise and further improve the noise reduction effect, in this embodiment, a noise reduction method based on the separation of sound sources is provided, namely: the input noisy signal is first separated by a sound source to obtain a signal component with a small amount of noise and a noise component with almost pure noise, the two paths are respectively subjected to noise and signal estimation, and finally the noise estimation required by the wiener filter is selected according to the endpoint detection (Voice Activity Detection, VAD) of the signal branch. If the VAD decision is noise, the noise estimation of the noise branch is used for wiener filtering, and if the VAD decision is speech, the noise of the signal branch is used for wiener filtering.

As shown in fig. 3, an algorithm flow chart of a voice noise reduction method based on sound source separation. The specific algorithm flow is as follows:

1. and (5) separating sound sources. And the sound source separation module is used for separating the input voice with noise into a voice branch and a noise branch. A voice branch (corresponding to the first voice data), wherein the voice signal is a main component and has a small amount of noise; noise branches (corresponding to the second speech data), the noise being the main component, with a small amount of speech signals.

2. And (5) time-frequency conversion. The two branch signals of voice and noise are respectively time-frequency transformed to the frequency domain (corresponding to the time-frequency transformation of the first voice data and the second voice data).

3. And (5) extracting characteristics. And respectively calculating the spectrum flatness characteristic, the log likelihood ratio characteristic and the spectrum difference characteristic, and updating the probability function of the voice/noise according to the three characteristics.

VAD calculation. For a voice branch, comparing a probability function with a threshold to perform Voice Activity Detection (VAD), wherein the probability is greater than the threshold and is determined to be voice, and the probability is less than the threshold and is determined to be noise; and the two branches respectively obtain the noise estimation of each branch according to the respective probability function.

5. Wiener filtering. According to the VAD result obtained in the step 4, if the voice branch VAD judges that voice is judged, calculating a frequency domain wiener filter coefficient by utilizing the noise estimation result of the voice branch; if the voice branch VAD judges that the voice branch VAD is noise, the frequency domain wiener filter coefficient is calculated by utilizing the noise obtained by the calculation of the noise branch.

6. And (5) reconstructing a signal. The signal is transformed from the frequency domain to the time domain using a short-time fourier transform.

The embodiment provided by the application has the following advantages: better noise reduction performance: the invention separates the input audio into two branches of voice and noise, and carries out the estimation of noise and signal on the two branches respectively, so the estimation accuracy of the noise and the voice is higher, and the noise reduction performance is better. The algorithm complexity is low: the invention can be obtained by directly adding the sound source separation on the basis of the open source code, so that the algorithm has low implementation difficulty and complexity.

It should be noted that, for simplicity of description, the foregoing method embodiments are all described as a series of acts, but it should be understood by those skilled in the art that the present invention is not limited by the order of acts described, as some steps may be performed in other orders or concurrently in accordance with the present invention. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily required for the present invention.

According to another aspect of the embodiment of the present invention, there is also provided a voice noise reduction apparatus for implementing the above voice noise reduction method. As shown in fig. 4, the voice noise reduction apparatus includes: a separation unit 41, a first determination unit 43, a second determination unit 45, and a noise reduction unit 47.

A separation unit 41, configured to perform voice separation on voice data to be noise reduced to obtain first voice data and second voice data of the voice data, where a proportion of the voice data in the first voice data is greater than a first threshold value, and a proportion of the noise data in the second voice data is greater than a second threshold value;

a first determining unit 43, configured to perform time-frequency transformation on the first voice data to determine a corresponding first probability function, and perform time-frequency transformation on the second voice data to determine a corresponding second probability function;

a second determining unit 45 for determining target noise reduction data of the voice data by the first probability function and the second probability function;

the noise reduction unit 47 is configured to perform noise reduction processing on the voice data through the target noise reduction data, so as to obtain target voice data.

Optionally, the first determining unit 43 may include:

the first determining module is used for determining first noise data of the first voice data according to a first probability function;

the second determining module is used for determining the first noise data as target noise data under the condition that the first voice data are determined to be voice data by voice activity detection according to the first probability function;

a third determining module for determining second noise data of the second voice data according to a second probability function;

and the fourth determining module is used for determining the second noise data as target noise data when the second voice data is determined to be the noise data by voice activity detection according to the first probability function.

Through the embodiment provided by the application, the separation unit 41 performs voice separation on voice data to be noise-reduced to obtain first voice data and second voice data of the voice data, wherein the voice data in the first voice data is larger than a first threshold value, and the noise data included in the second voice data is larger than voice data of a second threshold value; the first determining unit 43 performs time-frequency transformation on the first voice data to determine a corresponding first probability function, and performs time-frequency transformation on the second voice data to determine a corresponding second probability function; the second determining unit 45 determines target noise reduction data of the voice data by the first probability function and the second probability function; the noise reduction unit 47 performs noise reduction processing on the voice data by the target noise reduction data, resulting in target voice data.

As an alternative embodiment, the apparatus may further include:

the first computing unit is used for computing first characteristic parameters in the first voice data after performing time-frequency conversion on the first voice data and the second voice data respectively, wherein the first characteristic parameters comprise spectrum flatness characteristic parameters, log likelihood characteristic parameters and spectrum difference characteristic parameters;

the second computing unit is used for computing second characteristic parameters in the second voice data, wherein the second characteristic parameters comprise spectrum flatness characteristic parameters, log likelihood characteristic parameters and spectrum difference characteristic parameters;

and determining a first probability function and a second probability function according to the first characteristic parameter and the second characteristic parameter.

As an alternative embodiment, the apparatus may further include:

a third determining unit, configured to determine, before determining the first noise data of the first voice data according to the first probability function, that the first voice data is voice data if the first voice data probability value determined according to the first probability function is greater than a threshold value;

and a fourth determining unit, configured to determine that the first voice data is noise data when the first voice data probability value determined according to the first probability function is smaller than the threshold value.

As an alternative embodiment, the apparatus may further include:

and the third obtaining unit is used for carrying out noise reduction processing on the voice data through the noise reduction data, and after obtaining the target voice data, utilizing short-time Fourier transformation to transform the target voice data from a frequency domain to a time domain so as to obtain the reconstructed target voice data.

According to a further aspect of the embodiments of the present invention, there is also provided an electronic device for implementing the above-mentioned method of speech noise reduction, as shown in fig. 5, the electronic device comprising a memory 502 and a processor 504, the memory 502 having stored therein a computer program, the processor 504 being arranged to perform the steps of any of the method embodiments described above by means of the computer program.

Alternatively, in this embodiment, the electronic apparatus may be located in at least one network device of a plurality of network devices of the computer network.

Alternatively, in the present embodiment, the above-described processor may be configured to execute the following steps by a computer program:

s1, voice separation is carried out on voice data to be subjected to noise reduction, so as to obtain first voice data and second voice data of the voice data, wherein the proportion of the voice data in the first voice data is larger than a first threshold value, and the proportion of the noise data in the second voice data is larger than a second threshold value;

s2, performing time-frequency transformation on the first voice data to determine a corresponding first probability function, and performing time-frequency transformation on the second voice data to determine a corresponding second probability function;

s3, determining target noise reduction data of the voice data through the first probability function and the second probability function;

s4, noise reduction processing is carried out on the voice data through the target noise reduction data, and the target voice data are obtained.

Alternatively, it will be understood by those skilled in the art that the structure shown in fig. 5 is only schematic, and the electronic device may also be a terminal device such as a smart phone (e.g. an Android phone, an iOS phone, etc.), a tablet computer, a palm computer, and a mobile internet device (Mobile Internet Devices, MID), a PAD, etc. Fig. 5 is not limited to the structure of the electronic device. For example, the electronic device may also include more or fewer components (e.g., network interfaces, etc.) than shown in FIG. 5, or have a different configuration than shown in FIG. 5.

The memory 502 may be used to store software programs and modules, such as program instructions/modules corresponding to the voice noise reduction method and apparatus in the embodiment of the present invention, and the processor 504 executes the software programs and modules stored in the memory 502 to perform various functional applications and data processing, that is, implement the voice noise reduction method. Memory 502 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, memory 502 may further include memory located remotely from processor 504, which may be connected to the terminal via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof. The memory 502 may specifically, but not limited to, information for voice data to be noise reduced, separated first voice data, second voice data, and the like. As an example, as shown in fig. 5, the above memory 502 may include, but is not limited to, the separation unit 41, the first determination unit 43, the second determination unit 45, and the noise reduction unit 47 in the above voice noise reduction apparatus. In addition, other module units in the above-mentioned voice noise reduction device may be included, but are not limited thereto, and are not described in detail in this example.

Optionally, the transmission device 506 is configured to receive or transmit data via a network. Specific examples of the network described above may include wired networks and wireless networks. In one example, the transmission device 506 includes a network adapter (Network Interface Controller, NIC) that may be connected to other network devices and routers via a network cable to communicate with the internet or a local area network. In one example, the transmission device 506 is a Radio Frequency (RF) module, which is used to communicate with the internet wirelessly.

According to a further aspect of embodiments of the present invention, there is also provided a computer readable storage medium having a computer program stored therein, wherein the computer program is arranged to perform the steps of any of the method embodiments described above when run.

Alternatively, in the present embodiment, the above-described computer-readable storage medium may be configured to store a computer program for executing the steps of:

Alternatively, in this embodiment, it will be understood by those skilled in the art that all or part of the steps in the methods of the above embodiments may be performed by a program for instructing a terminal device to execute the steps, where the program may be stored in a computer readable storage medium, and the storage medium may include: flash disk, read-Only Memory (ROM), random-access Memory (Random Access Memory, RAM), magnetic or optical disk, and the like.

The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.

The integrated units in the above embodiments may be stored in the above-described computer-readable storage medium if implemented in the form of software functional units and sold or used as separate products. Based on such understanding, the technical solution of the present invention may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, comprising several instructions for causing one or more computer devices (which may be personal computers, servers or network devices, etc.) to perform all or part of the steps of the method described in the embodiments of the present invention.

In the foregoing embodiments of the present invention, the descriptions of the embodiments are emphasized, and for a portion of this disclosure that is not described in detail in this embodiment, reference is made to the related descriptions of other embodiments.

In several embodiments provided in the present application, it should be understood that the disclosed client may be implemented in other manners. The above-described embodiments of the apparatus are merely exemplary, and the division of the units, such as the division of the units, is merely a logical function division, and may be implemented in another manner, for example, multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some interfaces, units or modules, or may be in electrical or other forms.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The foregoing is merely a preferred embodiment of the present invention and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present invention, which are intended to be comprehended within the scope of the present invention.

Claims

1. A method of voice noise reduction, comprising:

performing voice separation on voice data to be subjected to noise reduction to obtain first voice data and second voice data, wherein the proportion of the voice data in the first voice data is larger than a first threshold value, and the proportion of the noise data in the second voice data is larger than a second threshold value;

performing time-frequency transformation on the first voice data to determine a corresponding first probability function, and performing time-frequency transformation on the second voice data to determine a corresponding second probability function;

determining target noise reduction data of the voice data through the first probability function and the second probability function;

noise reduction processing is carried out on the voice data through the target noise reduction data, so that target voice data are obtained;

determining target noise reduction data of the voice data through the first probability function and the second probability function, wherein the target noise reduction data comprises the following steps:

determining first noise data of the first voice data according to the first probability function;

under the condition that the voice activity detection is carried out according to the first probability function to determine that the first voice data is the voice data, the first noise data is determined to be the target noise reduction data;

determining second noise data of the second speech data according to the second probability function;

and under the condition that the voice activity detection is carried out according to the second probability function to determine that the second voice data is the noise data, determining the second noise data as the target noise reduction data.

2. The method of claim 1, wherein prior to determining the first noise data of the first speech data according to the first probability function, the method comprises:

determining that the first voice data is the voice data under the condition that the probability value of the first voice data determined according to the first probability function is larger than a threshold value;

and determining the first voice data as the noise data under the condition that the probability value of the first voice data determined according to the first probability function is smaller than a threshold value.

3. The method of claim 1, wherein after time-frequency transforming the first voice data and the second voice data, respectively, the method comprises:

calculating a first characteristic parameter in the first voice data, wherein the first characteristic parameter comprises a spectrum flatness characteristic parameter, a log likelihood characteristic parameter and a spectrum difference characteristic parameter;

calculating a second characteristic parameter in the second voice data, wherein the second characteristic parameter comprises a spectrum flatness characteristic parameter, a log likelihood characteristic parameter and a spectrum difference characteristic parameter;

and determining the first probability function and the second probability function according to the first characteristic parameter and the second characteristic parameter.

4. The method according to claim 1, wherein after performing noise reduction processing on the voice data by the noise reduction data to obtain target voice data, the method comprises:

and transforming the target voice data from a frequency domain to a time domain by utilizing short-time Fourier transformation to obtain reconstructed target voice data.

5. A speech noise reduction device, comprising:

the separation unit is used for carrying out voice separation on voice data to be subjected to noise reduction to obtain first voice data and second voice data of the voice data, wherein the proportion of the voice data in the first voice data is larger than a first threshold value, and the proportion of the noise data in the second voice data is larger than a second threshold value;

the first determining unit is used for performing time-frequency transformation on the first voice data to determine a corresponding first probability function and performing time-frequency transformation on the second voice data to determine a corresponding second probability function;

a second determining unit configured to determine target noise reduction data of the speech data by the first probability function and the second probability function;

the noise reduction unit is used for carrying out noise reduction processing on the voice data through the target noise reduction data to obtain target voice data;

the second determination unit includes:

a first determining module, configured to determine first noise data of the first voice data according to the first probability function;

the second determining module is used for determining the first noise data as the target noise reduction data under the condition that the first voice data are determined to be the voice data by voice activity detection according to the first probability function;

a third determining module, configured to determine second noise data of the second speech data according to the second probability function;

and a fourth determining module, configured to determine, when the second speech data is the noise data according to the second probability function and the speech activity detection is performed, the second noise data as the target noise reduction data.

6. The apparatus of claim 5, wherein the apparatus comprises:

a second calculating unit, configured to calculate a second feature parameter in the second voice data, where the second feature parameter includes a spectral flatness feature parameter, a log likelihood feature parameter, and a spectral difference feature parameter;

7. A computer readable storage medium comprising a stored program, wherein the program when run performs the method of any one of the preceding claims 1 to 4.

8. An electronic device comprising a memory and a processor, characterized in that the memory has stored therein a computer program, the processor being arranged to execute the method according to any of the claims 1-4 by means of the computer program.