CN107483761A

CN107483761A - A kind of echo suppressing method and device

Info

Publication number: CN107483761A
Application number: CN201610399409.4A
Authority: CN
Inventors: 梁民; 汪法兵; 沙永涛
Original assignee: China Academy of Telecommunications Technology CATT
Current assignee: China Academy of Telecommunications Technology CATT
Priority date: 2016-06-07
Filing date: 2016-06-07
Publication date: 2017-12-15
Anticipated expiration: 2036-06-07
Also published as: CN107483761B

Abstract

The invention discloses a kind of echo suppressing method and device.The inventive method includes：Linear echo component in first audio signal is suppressed, obtains the first echo suppressed signal；According to the first echo suppressed signal, the non linear echo components in the second audio signal are suppressed, obtain the second echo suppressed signal；Wherein, the first audio signal and the second audio signal are that the audio receiver in audio receiver array collects；Linear echo component in second echo suppressed signal and residual non linear echo components are suppressed respectively.The present invention can effectively suppress linear processes acoustic echo.

Description

Echo suppression method and device

Technical Field

The present invention relates to the field of communications technologies, and in particular, to an echo suppression method and apparatus.

Background

Hands-free calling is one of the application features generally required for devices having a communication function, such as mobile phones and tablet computers. For the hands-free call mode, the acoustic echo suppression technology is an important technology for improving the call quality, and the real-time changing characteristic of the acoustic echo brings great technical challenges to the acoustic echo suppression technology. In the prior art, adaptive filters are generally used in echo suppression techniques to estimate echoes, since they have the function of rapidly tracking real-time changing signals in an unknown environment.

Because the speaker on the mobile phone and the tablet computer is small in size and is close to the microphone, the sound emitted from the speaker is usually large in the hands-free call mode, so that the echo signal picked up at the microphone is often stronger than the near-end voice signal; furthermore, small-scale loudspeakers give rise to nonlinear acoustic echoes both in the case of loud sounds and also in component vibrations in the echo path. This makes the existing monophonic acoustic echo suppressor implemented based on linear adaptive filter unable to effectively cancel and suppress the acoustic echo of such communication device in the hands-free talk mode.

For the non-linear characteristics of acoustic echo of a communication device such as a mobile phone in a hands-free call mode, a non-linear echo suppressor is correspondingly used in some existing technologies, but these technical solutions inevitably suppress and further damage a near-end speech signal when suppressing a non-linear residual echo when a non-linear residual echo component is large, and especially when the amplitude of the non-linear echo is equal to the amplitude of the near-end speech signal, the near-end speech signal in the technical solution using the non-linear echo suppressor is greatly damaged, and at a target receiving end, the received speech signal may even be completely distorted.

Therefore, how to implement an echo suppression technique capable of effectively suppressing linear and nonlinear acoustic echoes is a problem to be researched and explored in the industry.

Disclosure of Invention

The embodiment of the invention provides an echo suppression method and an echo suppression device, which are used for realizing an echo suppression technology capable of effectively suppressing linear and nonlinear acoustic echoes.

Some embodiments of the invention provide an echo suppression method, comprising:

suppressing a linear echo component in the first audio signal to obtain a first echo suppression signal;

according to the first echo suppression signal, suppressing a nonlinear echo component in a second audio signal to obtain a second echo suppression signal; wherein the first audio signal and the second audio signal are acquired by audio receivers in an audio receiver array;

and suppressing a linear echo component and a residual nonlinear echo component in the second echo suppression signal.

In some optional embodiments of the invention, suppressing a linear echo component in a first audio signal to obtain a first echo suppression signal includes:

filtering the far-end audio signal by using a first self-adaptive filter to obtain a first linear echo component;

and according to the first linear echo component, suppressing the linear echo component in the first audio signal to obtain a first echo suppression signal.

Some optional embodiments of the invention, further comprising: and if the current state is in the single-talk state, updating the coefficient vector of the first adaptive filter.

In some optional embodiments of the invention, suppressing, according to the first echo suppression signal, a nonlinear echo component in a second audio signal to obtain a second echo suppression signal includes:

filtering the first echo suppression signal by using a second adaptive filter to obtain a first nonlinear echo component;

and according to the first nonlinear echo component, suppressing the nonlinear echo component in the second audio signal to obtain a second echo suppression signal.

Some optional embodiments of the invention, further comprising: and if the current single-talk state is judged and the average power of the first echo suppression signal is larger than a first preset threshold value, or the current single-talk state is judged and the average power of a first linear echo component is larger than a second preset threshold value, updating the coefficient vector of the second adaptive filter, wherein the first linear echo component is obtained by filtering the far-end audio signal by using the first adaptive filter.

In some optional embodiments of the invention, suppressing the linear echo component and the residual nonlinear echo component in the second echo suppression signal comprises:

filtering the far-end audio signal by using a third adaptive filter to obtain a second linear echo component;

according to the second linear echo component, suppressing the linear echo component in the second echo suppression signal to obtain a third echo suppression signal;

and according to the second echo suppression signal and the third echo suppression signal, using a nonlinear echo suppressor to suppress residual nonlinear echo components in the third echo suppression signal.

Some optional embodiments of the invention, further comprising: and if the current state is in the single-talk state, updating the coefficient vector of the third adaptive filter.

In some optional embodiments of the present invention, the following manner is adopted to determine whether the current state is the single talk state:

extracting the voiceprint feature vectors of the first audio signal and the first linear echo component respectively according to the first audio signal and the first linear echo component;

calculating a similarity between a voiceprint feature vector of the first audio signal and a voiceprint feature vector of the first linear echo component;

if the calculated similarity is larger than a preset threshold value, judging that the mobile phone is currently in a double-talk state, and otherwise, judging that the mobile phone is currently in a single-talk state.

In some alternative embodiments of the present invention, the audio receivers in the audio receiver array are arranged in an end fire array.

Some embodiments of the invention provide an echo suppression device comprising:

the first echo suppression unit is used for suppressing a linear echo component in the first audio signal to obtain a first echo suppression signal;

the second echo suppression unit is used for suppressing a nonlinear echo component in a second audio signal according to the first echo suppression signal to obtain a second echo suppression signal; wherein the first audio signal and the second audio signal are acquired by audio receivers in an audio receiver array;

and the third echo suppression unit is used for suppressing the linear echo component and the residual nonlinear echo component in the second echo suppression signal.

In some optional embodiments of the invention, the first echo suppressing unit comprises:

the first adaptive filter is used for filtering the far-end audio signal to obtain a first linear echo component;

and the first echo suppression module is used for suppressing the linear echo component in the first audio signal according to the first linear echo component to obtain a first echo suppression signal.

In some optional embodiments of the invention, the first adaptive filter is further configured to: and updating the coefficient vector when the current single-talk state is achieved.

In some optional embodiments of the invention, the second echo suppressing unit comprises:

the second self-adaptive filter is used for filtering the first echo suppression signal to obtain a first nonlinear echo component;

and the second echo suppression module is used for suppressing the nonlinear echo component in the second audio signal according to the first nonlinear echo component to obtain a second echo suppression signal.

In some optional embodiments of the invention, the second adaptive filter is further configured to: when the current single-talk state is achieved and the average power of the first echo suppression signal is larger than a first preset threshold value, or when the current single-talk state is achieved and the average power of a first linear echo component is larger than a second preset threshold value, updating the coefficient vector, wherein the first linear echo component is obtained by filtering a far-end audio signal through a first adaptive filter.

In some optional embodiments of the invention, the third echo suppressing unit comprises:

the third adaptive filter is used for filtering the far-end audio signal to obtain a second linear echo component;

the third echo suppression module is configured to suppress a linear echo component in the second echo suppression signal according to the second linear echo component, so as to obtain a third echo suppression signal;

and the nonlinear echo suppressor is used for suppressing the residual nonlinear echo component in the third echo suppression signal according to the second echo suppression signal and the third echo suppression signal.

In some optional embodiments of the invention, the third adaptive filter is further configured to: and updating the coefficient vector when the current single-talk state is achieved.

Some optional embodiments of the invention, further comprising: the judging unit is used for respectively extracting the voiceprint feature vectors of the first audio signal and the first linear echo component according to the first audio signal and the first linear echo component; for calculating a similarity between a voiceprint feature vector of the first audio signal and a voiceprint feature vector of the first linear echo component; and if the calculated similarity is larger than a preset threshold value, judging that the mobile phone is currently in a double-talk state, otherwise, judging that the mobile phone is currently in a single-talk state.

In the above embodiment of the present invention, for a first audio signal and a second audio signal acquired by an audio receiver in an audio receiver array, a first echo suppression signal is obtained by suppressing a linear echo component in the first audio signal, a second echo suppression signal is obtained by suppressing a non-linear echo component in the second audio signal according to the first echo suppression signal, and finally a linear echo component and a residual non-linear echo component in the second echo suppression signal are suppressed, so that suppression of linear and non-linear acoustic echoes in the audio signals is achieved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

FIG. 1 is a schematic diagram of a prior art single-microphone nonlinear acoustic echo canceller;

FIG. 2 is a schematic diagram of an exemplary structure of an echo suppression solution according to an embodiment of the present invention;

FIG. 3 is a flow chart illustrating an echo suppression method according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of an exemplary structure of an echo suppression solution using a controller according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of an echo suppressing device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be described in further detail with reference to the accompanying drawings, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In order to solve the problem that the prior art implemented by a linear adaptive filter cannot effectively handle the nonlinear characteristics of acoustic echoes of devices with communication functions such as mobile phones and tablet computers in a hands-free call mode, some prior art has proposed a nonlinear adaptive filter such as an artificial neural network to implement monophonic acoustic echo suppression, however, the algorithm of the nonlinear adaptive filter is complex and the amount of calculation is large, so that the prior art cannot be implemented well on a current commercial Digital Signal Processing (DSP) chip.

Fig. 1 shows a schematic structural diagram of a Non-linear acoustic Echo canceller (NLAEC) in the prior art.

A single microphone NLAEC, also known as a post-filtering nonlinear acoustic Echo Suppressor, includes a Finite long unit Impulse Response (FIR) Linear adaptive filter 101 and a nonlinear Echo Suppressor (NLES) 102, as shown in fig. 1. As shown in fig. 1, when the speaker 104 plays a corresponding sound after receiving a far-end audio signal in the acoustic system, the played sound will generate linear and nonlinear acoustic echoes at the microphone 103 through the external environment, and the microphone 103 will pick up an acoustic echo signal in response to the sound played by the speaker 104 under the influence of the acoustic echoes. In NLAEC, for an acoustic echo signal picked up by the microphone 103, first a FIR linear adaptive filter 101 is applied to estimate a linear component in the acoustic echo signal, which is then subtracted from the signal received by the microphone 103; and then applying NLES 102 to restrain the residual nonlinear component in the acoustic echo signal, thereby achieving the purpose of eliminating and restraining the acoustic echo.

However, when the nonlinear residual echo component is large, the nonlinear echo suppressor (NLES 102) in the single-microphone NLAEC shown in fig. 1 will inevitably suppress and damage the near-end speech signal when suppressing the nonlinear residual echo, and especially when the amplitude of the nonlinear echo is equal to that of the near-end speech signal, the nonlinear echo suppressor NLES has a large loss to the near-end speech signal.

From the above analysis, it can be seen that the single-microphone NLAEC using post-filtering in the prior art is not an ideal technical solution for echo suppression, while the non-linear adaptive filter such as an artificial neural network is not easy to be implemented in engineering due to the complexity of calculation, and there is no echo suppression technology that can effectively suppress linear and non-linear acoustic echoes.

In order to overcome the defects of the existing echo suppression technical scheme, the invention provides an echo suppression technical scheme capable of effectively suppressing linear and nonlinear acoustic echoes, and the invention provides an echo suppression technical scheme. In the embodiment of the invention, the audio receivers in the audio receiver array receive multi-channel audio signals, linear echo components of one channel of audio signals are suppressed, nonlinear echo components of the other channel of audio signals are suppressed according to the suppression result, and the linear echo components and the nonlinear echo components are further suppressed, so that the linear and nonlinear acoustic echoes are effectively suppressed.

For example, taking an example that an audio receiver array includes two audio receivers (assuming two microphones a and B), in the Echo suppression technology provided in the embodiment of the present invention, an FIR adaptive filter is introduced in a branch of the microphone a to construct a Linear Acoustic Echo Canceller (LAEC), and an output of the FIR adaptive filter is a nonlinear Echo component in an Echo signal; feeding the nonlinear echo component into a second FIR adaptive Filter, so that an adaptive zero space notch (ANF) shaping device formed by the second FIR adaptive Filter and the microphone B focuses on inhibiting the nonlinear echo component; while the remaining linear echo component and a small amount of residual nonlinear echo component are processed by a FIR adaptive filter and a nonlinear echo suppressor (NLES) in a subsequent conventional nonlinear acoustic echo canceller NLAEC on the branch of the microphone B, respectively. It can be seen that the nonlinear component in the input echo signal of NLAEC in the echo suppression technology provided by the embodiment of the present invention is weak, so that the attenuation of the NLES for subsequent processing on the near-end speech signal is greatly reduced, and therefore, the echo suppression technology provided by the embodiment of the present invention can not only effectively suppress linear and nonlinear acoustic echoes, but also reduce the loss of the NLAEC on the near-end speech signal.

Fig. 2 is a schematic diagram schematically illustrating an example structure of an echo suppression solution according to an embodiment of the present invention. As shown in fig. 2, an exemplary structure of echo suppression provided by an embodiment of the present invention includes a Linear Acoustic Echo Canceller (LAEC)201, an adaptive zero-space direction shaper (ANF)202, and a nonlinear acoustic echo canceller (NLAEC) 203. The Linear Acoustic Echo Canceller (LAEC)201 includes a FIR adaptive filter (denoted as AF3), the adaptive zero-space direction-finding shaper (ANF)202 includes a delay module (denoted as D) and a FIR adaptive filter (denoted as AF2), and the nonlinear acoustic echo canceller (NLAEC)203 includes a FIR adaptive filter (denoted as AF1) and a nonlinear echo suppressor (NLES).

It should be noted that the FIR adaptive filter generally refers to an adaptive filter adopting a FIR structure, and the adaptive filter is generally a filter that automatically adjusts filter coefficients to achieve optimal filter characteristics by using an algorithm based on the estimation of the statistical characteristics of input and output signals, and may be implemented in a software manner, a hardware manner, or a combination of software and hardware. The present application does not limit the specific implementation of the FIR adaptive filter.

As shown in fig. 2, the speaker 204 generates an echo signal when playing sound based on the received far-end audio signal, the LAEC 201 is located on the branch where the microphone 205 is located, and after the echo signal received by the microphone 205 is processed by the AF3 in the LAEC 201, the nonlinear echo component in the echo signal is output to the ANF 202 located on the branch where the microphone 206 is located and formed based on the AF2, so as to focus on suppressing the nonlinear echo component; the ANF 202 is configured to perform echo suppression on the echo signal received by the microphone 206 according to an echo suppression result of the LAEC 201 on the echo signal received by the microphone 205, so that the delay module D is adopted to perform delay processing on the echo signal received by the microphone 206 to meet causality in time; all linear echo components and a small amount of residual non-linear echo components remaining in the output signal of the ANF 202 are then processed separately via subsequent conventional NLAECs 203 located on the microphone 206 branch, thereby achieving suppression of linear as well as non-linear echo components.

The exemplary structure of the echo suppression solution according to the embodiment of the present invention shown in fig. 2 may also be referred to as a Dual Microphone Non-Linear Acoustic echo canceller (DMNLAEC).

Embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

Fig. 3 is a schematic flow chart illustrating an echo suppression method according to an embodiment of the present invention. Alternatively, the flow may be applied in the example structure shown in fig. 2 or implemented in the example structure shown in fig. 2.

As shown in fig. 3, a flow of an echo suppression method according to an embodiment of the present invention includes the following steps:

step 301: suppressing a linear echo component in the first audio signal to obtain a first echo suppression signal;

step 302: according to the first echo suppression signal, suppressing a nonlinear echo component in the second audio signal to obtain a second echo suppression signal;

the first audio signal and the second audio signal are acquired by audio receivers in an audio receiver array;

step 303: and suppressing the linear echo component and the residual nonlinear echo component in the second echo suppression signal.

In some embodiments of the invention, the number of audio receivers in the array of audio receivers may be two or more. Optionally, a first audio receiver in the array of audio receivers receives the first audio signal and a second audio receiver in the array of audio receivers receives the second audio signal.

Alternatively, the audio receiver arrays may be arranged in an end-fire array.

For example, when some embodiments of the present invention are applied in a mobile communication device, due to the limited space of the mobile communication device, it is often only possible to place 2 audio receivers (e.g., microphones) at short intervals, typically between 5 and 20 mm apart, optionally with these 2 microphones having short intervals placed in an end-fire manner in the device.

In some embodiments of the invention, the step 301 of suppressing the linear echo component in the first audio signal to obtain the first echo suppression signal may be a step of obtaining the first linear echo component by filtering the far-end audio signal by using a first adaptive filter (such as AF3 shown in fig. 2); and then, according to the first linear echo component, suppressing the linear echo component in the first audio signal to obtain a first echo suppression signal.

In some embodiments of the present invention, if it is determined that the current state is the talk-once state, the coefficient vector of the first adaptive filter may be further updated.

The one-talk state may be a state indicating that, in the communication device including the audio receiver and the audio player, only the audio player plays sound based on the far-end audio signal, and correspondingly, the two-talk state is a state indicating that, while the audio player plays sound based on the far-end audio signal, the audio receiver receives a near-end audio signal generated by a near-end user.

In some embodiments of the invention, in step 302, the nonlinear echo component in the second audio signal is suppressed according to the first echo suppression signal to obtain a second echo suppression signal, which may be obtained by filtering the first echo suppression signal with a second adaptive filter (such as AF2 shown in fig. 2) to obtain a first nonlinear echo component; and then, according to the first nonlinear echo component, suppressing the nonlinear echo component in the second audio signal to obtain a second echo suppression signal.

Since the echo suppression of the second audio signal in step 302 is based on the first echo suppression signal obtained in step 301, in order to satisfy the causality in time sequence, the second audio signal subjected to echo suppression in step 302 is obtained by performing a delay process on the second audio signal received by the audio receiver, and the second audio signal is not lost by the delay process. For example, an exemplary expression may be: assuming that the delay parameter is D, the second audio signal received by the audio receiver is D₁(n), then the second audio signal that is echo suppressed in step 302 may be denoted as d₁(n-D)。

In some embodiments of the present invention, if it is determined that the current state is the single-talk state and the average power of the first echo suppression signal is greater than the first preset threshold, or it is determined that the current state is the single-talk state and the average power of the first linear echo component is greater than the second preset threshold, the coefficient vector of the second adaptive filter may be further updated, and the first linear echo component is obtained by filtering the far-end audio signal using the first adaptive filter.

In some embodiments of the present invention, the step 303 of suppressing the linear echo component and the residual nonlinear echo component in the second echo suppressed signal may be to obtain the second linear echo component by filtering the far-end audio signal with a third adaptive filter (such as AF1 shown in fig. 2); then according to the second linear echo component, suppressing the linear echo component in the second echo suppression signal to obtain a third echo suppression signal; and then, according to the second echo suppression signal and the third echo suppression signal, a nonlinear echo suppressor is used for suppressing a residual nonlinear echo component in the third echo suppression signal.

In some embodiments of the present invention, if it is determined that the current state is the talk-once state, the coefficient vector of the third adaptive filter may be further updated.

In order to more clearly describe the echo suppression method provided by the embodiment of the present invention as shown in fig. 3, the following will specifically describe the technical solution of echo suppression provided by the embodiment of the present invention in conjunction with the exemplary structure of the dual-microphone nonlinear acoustic echo canceller DMNLAEC provided by the embodiment of the present invention as shown in fig. 2.

Based on the example structure shown in fig. 2, in some embodiments of the invention, the audio receiver array may be a microphone array, and the microphone 205 and the microphone 206 may be two audio receivers located at different positions in the microphone array.

Wherein, the audio signal received by the microphone 205 is used as the first audio signal, and d is used as the second audio signal₂(n) represents; the audio signal received by the microphone 206 is used as the second audio signal, d₁(n) represents; the far-end audio signal is an audio signal received by the speaker 204 from a far-end userRepresents; playing the received far-end audio signal at the speaker 204Time of flight, first audio signal d₂(n) the microphone 205 will be included to play the far-end audio received by the speaker 204SignalThe generated echo signal, the second audio signal will include the far-end audio signal received by the microphone 206 and played by the speaker 204The generated echo signal; wherein n represents n time and n is a positive integer.

In some embodiments of the invention, the first audio signal d received by the microphone 205 is first acquired by a Linear Acoustic Echo Canceller (LAEC)201₂(n), wherein the first audio signal d₂(n) may be understood as the first audio signal in step 301.

Further, the Linear Acoustic Echo Canceller (LAEC)201 may be configured to perform the process of suppressing the linear echo component in the first audio signal to obtain the first echo suppressed signal as described in step 301, and the process may also be understood as the process of suppressing the first echo from the first audio signal d by the Linear Acoustic Echo Canceller (LAEC)201₂And (n) extracting nonlinear echo components from the echo signals, wherein the obtained result is the first echo suppression signal. Specifically, the Linear Acoustic Echo Canceller (LAEC)201 may perform step 301 by an FIR adaptive filter (AF 3).

Wherein the FIR adaptive filter (AF3) may be based on the audio signal from the far endFirst, the far-end audio signal is processed according to the following formulaFiltering to estimate a first linear echo component:

wherein, y₃(n) represents a first linear echo component,for the coefficient vector of the FIR adaptive filter (AF3) at time n,L₃representing the dimension n, L of a coefficient vector of the FIR adaptive filter (AF3)₃Is a positive integer;being a vector of time instants n of the far-end audio signal,t is a transposition operator in vector operation.

Based on the resulting first linear echo component y₃(n) in the Linear Acoustic Echo Canceller (LAEC)201, the first audio signal d may be further processed₂(n) by the formula e₃(n)＝d₂(n)-y₃(n) suppressing the linear echo component in the first audio signal to obtain a first echo suppressed signal, e₃(n) represents a first echo suppression signal.

It should be noted that the first echo suppression signal e obtained by the above-described procedure₃(n) in the first audio signal d₂(n) the condition that the near-end speech signal is not included (or also called the single-talk state) will include only the non-linear echo component in the echo signal, while in the first audio signal d₂The condition (n) includes the near-end speech (or also called the double-talk state) includes the nonlinear echo component in the echo signal and the near-end speech signal.

Further, the coefficient vector of the FIR adaptive filter (AF3) may be matched when currently in the single talk stateAnd (6) updating. In particular, a coefficient vector of an FIR adaptive filter (AF3)The updating (or also referred to as learning) of (a) may be performed using an adaptive learning Algorithm such as a Normalized Least Mean Square Algorithm (NLMS) or an Affine Projection Algorithm (APA) or a Recursive Least squares Algorithm (RLS), wherein in case of a single-talk state, the coefficient learning updating of AF3 may be continued, and in case of a double-talk, the coefficient learning updating of AF3 may be stopped.

As an example, in some embodiments of the invention, the first echo suppression signal e is based on₃(n) and far-end audio signalsBy NLMS algorithm, AF3 coefficient vectorThe update can be done according to the following formula:

wherein,represents the coefficient vector of the FIR adaptive filter (AF3) at time n,a coefficient vector for the FIR adaptive filter (AF3) at time instant n + 1; 0<μ₃<In order to update the compensation parameters, 1,₃>0 is a regularization factor parameter; wherein, the formula (i) is a formula used for updating the coefficient vector of the FIR self-adaptive filter (AF3) when judging that the FIR self-adaptive filter is in the single-talk state at present; (ii) the formula represents F when judging that the current double-talk state isThe coefficient vector of the IR adaptive filter (AF3) is not updated.

Further, based on the exemplary structure of an embodiment of the invention shown in fig. 2, in some embodiments of the invention, the first echo suppression signal e obtained by the FIR adaptive filter (AF3) is obtained₃(n) may be further fed into an adaptive zero-space-domain-pointing-shaper (ANF)202 for processing.

Due to the first audio signal d received by the microphone 205₂The nonlinear echo component in (n) and the second audio signal d received by the microphone 206₁(n) in order to concentrate the zero-space direction of the adaptive zero-space direction shaping device (ANF) formed by the FIR adaptive filter (AF2) and the microphone 206 on the second audio signal d₁(n) the nonlinear echo component in the first audio signal d is optionally processed by processing the signal obtained in step 302 to include the nonlinear echo component₂(n) first echo suppression signal e of the nonlinear echo component₃(n) is fed into the FIR adaptive filter (AF2) as an input, and the updating of the coefficient vector of the FIR adaptive filter (AF2) is performed when the condition that the current state is the one-way speech state and the nonlinear echo component is large, which may mean that the nonlinear echo component has energy higher than a preset threshold value, is satisfied.

It should be noted that in some embodiments of the present invention, the short distance between the microphones 205 and 206, and the short pitch two-microphone array may be placed in either broadside or end-fire mode, and when a near-end target audio signal is present, the zero-space direction of the adaptive zero-space direction shaping filter (AF2) and the microphone 206 will also affect the near-end speech signal from the zero-space direction.

Considering that the placement of the two microphones will affect the characteristics of the adaptive zero-space-pointing-shaping device (ANF), since the edge-fire placed two-microphone adaptive zero-space-pointing-shaping device will have a relatively wider zero-space-pointing direction than the end-fire placed two-microphone adaptive zero-space-pointing-shaping device (ANF), and thus will be more susceptible to attenuation of the near-end speech signal, an alternative placement is to place the two microphones (e.g., microphone 205 and microphone 206 shown in fig. 2) in an end-fire manner to avoid or reduce the attenuation of the near-end audio signal as much as possible. Wherein the microphone 205 is located closer to the audio player (such as the loudspeaker (Speaker)204 shown in fig. 2), on whose branches the linear acoustic echo canceller (lae c)201 is located, and the microphone 206 is located farther from the loudspeaker 204, on whose branches the adaptive zero-space directional shape maker (ANF) and the non-linear acoustic echo canceller (NLAEC) are located.

Specifically, the adaptive zero-spatial-domain-pointing-shaping machine (ANF) may obtain the second audio signal d received by the microphone 206 on the one hand₁(n), the second audio signal d₁(n) i.e. may be understood as the second audio signal in step 301; on the other hand, the first echo suppression signal e can be received₃(n), the step 302 is executed to suppress the nonlinear echo component in the second audio signal according to the first echo suppression signal, so as to obtain a second echo suppression signal.

Further, the adaptive zero-space-domain pointing shaper (ANF)202 may perform the process of step 302 by means of an FIR adaptive filter (AF2) included therein. In particular, the FIR adaptive filter (AF2) may be based on a received first echo suppression signalFirst, a first echo suppression signal is processed according to the following formulaFiltering to estimate a first nonlinear echo component:

wherein, y₂(n) represents a first nonlinear echo component,for the coefficient vector of the FIR adaptive filter (AF2) at time n,L₂is the dimension, n, L, of the coefficient vector of the FIR adaptive filter (AF2)₂Is a positive integer;a first echo suppression signal at time n;

based on the resulting first nonlinear echo component y₂(n) in the adaptive zero space domain direction shaping (ANF)202, the delayed second audio signal d may be further processed₁(n-D) by the formula e₂(n)＝d₁(n-D)-y₂(n) suppressing the nonlinear echo component in the second audio signal to obtain a second echo suppressed signal, e₂And (n) represents.

Wherein d is₁(n-D) represents the second audio signal D₁(n) delaying the processed signal according to the delay parameter D, and acquiring a second audio signal D at the time n due to the adaptive zero-space-domain-pointing shaping machine (ANF)₁(n) does not immediately participate in the echo suppression process, so the adaptive zero-space-domain-direction-shaper (ANF) will align the second audio signal d₁(n) performing a delay process (e.g., the D delay portion shown in FIG. 2) to satisfy causality, thereby obtaining a delayed second audio signal, D₁(n-D) represents, and D may be generally selected in accordance with D ═ round (L)₂And/2) calculating.

Specifically, the coefficient vector of the FIR adaptive filter (AF2) is updated when the current state is in the single talk state and the nonlinear echo component has an energy higher than a preset threshold, wherein, considering that the power of the linear component in the acoustic echo is proportional to the power of its corresponding nonlinear component, the energy of the nonlinear echo component higher than the preset threshold may be specifically represented by that the average power of the first linear echo component is greater than a second preset threshold, or that the average power of the first echo suppression signal is greater than a first preset threshold, etc. Further, if it is determined that the current state is the single talk state and the average power of the first echo suppression signal is greater than a first preset threshold, or it is determined that the current state is the single talk state and the average power of the first linear echo component is greater than a second preset threshold, the coefficient vector of the FIR adaptive filter (AF2) may be updated.

It should be noted that the manner for determining the condition that the nonlinear echo component is large includes, but is not limited to, the above manner provided by the embodiments of the present invention, such as may be based on any physical quantity and combination thereof capable of reflecting the energy or power of the nonlinear echo.

Specifically, similarly to the update of the coefficient vector of the FIR adaptive filter (AF3) described previously, the coefficient vector of the FIR adaptive filter (AF2) may be updated using an adaptive learning algorithm such as NLMS or APA.

As an example, in some embodiments of the invention, the first echo suppression signal e is based on₃(n) and a second echo suppression signal e₂(n) the AF2 coefficient vector can be updated by NLMS algorithm according to the following formula:

wherein,for the coefficient vector of the FIR adaptive filter (AF2) at time n,a coefficient vector for the FIR adaptive filter (AF2) at time instant n + 1; 0<μ₂<In order to update the compensation parameters, 1,₂>0 is a regularization factor parameter; (i) the formula is a formula for updating a coefficient vector of the FIR adaptive filter (AF2) when the current single-talk state is judged and the nonlinear echo component is further judged to be large (for example, specifically, the average power of the first linear echo component is greater than a second preset threshold value, or the average power of the first echo suppression signal is greater than a first preset threshold value, etc.); (ii) the expression indicates that the coefficient vector of the FIR adaptive filter (AF2) is not updated when the expression (i) condition is not satisfied.

It should be noted that the second audio signal d is received by the microphone 206₁(n) after the adaptive zero space domain pointing shaping device (ANF) processing, the second echo suppression signal e output by the adaptive zero space domain pointing shaping device₂(n) contains a dominant linear echo component and a small amount of residual nonlinear echo, thus for e₂The (n) signal can be processed directly using a conventional non-linear acoustic echo canceller NLAEC with satisfactory results.

Where NLAEC can actually be a cascade of LAEC and nonlinear echo suppressors (NLES), using a cascade with L₁FIR adaptive filter (AF1) of dimensional coefficient vectors to estimate e₂Linear echo component in (n) and from e₂And (n) the linear echo component is subtracted, so that the dominant linear echo component is suppressed, and a post-nonlinear echo suppressor (NLES) is used for suppressing the residual nonlinear echo component.

Based on the exemplary structure of an embodiment of the present invention shown in fig. 2, in some embodiments of the present invention, the second echo suppression signal e obtained by the FIR adaptive filter (AF2) is obtained₂(n) which may be fed to a nonlinear acoustic echo canceller (NLAEC)203 to take part in subsequent processing.

In particular, the non-linear acoustic echo canceller (NLAEC)203 may be configured to perform the process of suppressing the linear echo component and the residual non-linear echo component in the second echo suppression signal in step 303. The nonlinear acoustic echo canceller (NLAEC)203 may filter the far-end audio signal through an FIR adaptive filter (AF1) included therein to obtain a second linear echo component, and suppress the linear echo component in the second echo suppressed signal according to the second linear echo component to obtain a third echo suppressed signal; and then according to the second echo suppression signal and the third echo suppression signal, a nonlinear echo suppressor (NLES) contained in the second echo suppression signal and the third echo suppression signal suppresses residual nonlinear echo components in the third echo suppression signal.

In particular, the FIR adaptive filter (AF1) in the nonlinear acoustic echo canceller (NLAEC)203 may be based on the received far-end audio signalFirst, the far-end audio signal is processed according to the following formulaFiltering to estimate a second linear echo component:

wherein, y₁(n) represents a second linear echo component,for the coefficient vector of the FIR adaptive filter (AF1) at time n,L₁representing the dimension n, L that the coefficient vector of the FIR adaptive filter (AF1) has₁Is a positive integer;a far-end audio signal vector representing time n,

based on the obtained second linear echo component y₁(n) in the non-linear acoustic echo canceller (NLAEC)203, the signal e may be further suppressed from the second echo₂In (n) by the formula e₁(n)＝e₂(n)-y₁(n) suppressing the linear echo component in the second echo suppression signal to obtain a third echo suppression signal, e₁And (n) represents.

Specifically, similarly to the update of the coefficient vectors of the FIR adaptive filter (AF3) and the FIR adaptive filter (AF2) described previously, the coefficient vector of the FIR adaptive filter (AF1) is updatedThe updating can also be performed when the current single-talk state is judged, for example, the updating is performed by using an adaptive learning algorithm such as NLMS or APA or RLS.

As an example, in some embodiments of the invention, the third echo suppression signal e is based on₁(n) and far-end audio signalsWith NLMS algorithm, the AF1 coefficient vector can be updated according to the following formula:

wherein,for the coefficient vector of the FIR adaptive filter (AF1) at time n,a coefficient vector for the FIR adaptive filter (AF1) at time instant n + 1; 0<μ₂<In order to update the compensation parameters, 1,₂>0 is a regularization factor parameter; (i) the formula is a formula for updating the coefficient vector of the FIR self-adaptive filter (AF1) when the current state of single talk is judged; (ii) the expression indicates that the coefficient vector of the FIR adaptive filter (AF1) is not updated when it is judged that the current state is the double talk state.

Further, a third echo suppression signal e₁The residual nonlinear echo component in (n) may be further processed by a nonlinear echo suppressor (NLES) in a nonlinear acoustic echo canceller (NLAEC) 203. Since the remaining nonlinear echo component is weak at this time, the suppression gain value of NLES is generally large (not more than 1 but close to 1), so if the signal e is suppressed at the third echo₁(n) contains the near-end target audio signal, the NLES will not or less attenuate it. In particular, NLES may be the second echo suppression signal e according to its input signal₂(n) and a third echo suppression signal e₁(n) first generating a spectral modification gain in the frequency domain, and modifying the third echo suppression signal e with the gain₁(n) to achieve suppression of the third echo suppression signal e₁(n) the nonlinear echo is left.

Optionally, a non-linear echo suppressor (NLES) in the non-linear acoustic echo canceller (NLAEC)203 may suppress the signal e according to the input second echo according to a minimum mean square error criterion₂(n) and a third echo suppression signal e₁(n) first generating a spectral correction suppression gain in the frequency domain according to the following equation:

wherein G (m, k) represents a spectral modification suppression gain generated by NLES,denotes e₁(n) and e₂(n) a cross-power spectrum between,representing a signal e₁The self-power spectrum of (n) is calculated by the following formulas respectively:

wherein E is₂(m, k) is the second echo suppression signal e₂(n) short-time Fourier transform of the mth data block, E₁(m, k) is the third echo suppressed signal e₁(n) short-time Fourier transform of the mth data block, k being the bin index, conj { } being the complex conjugate operator, λ being the smoothing factor constant, 0<λ<1, and typically λ may range from 0.925 to 0.999.

Furthermore, the third echo suppression signal e may be corrected based on the generated spectral correction suppression gain G (m, k) according to the following formula₁E of (n)₁(m,k)；Y(m,k)＝E₁(m, k) · G (m, k), and performing short-time inverse fourier transform on the obtained Y (m, k) to obtain an output signal Y (n) of the nonlinear echo suppressor (NLES), wherein when the received second audio signal includes an echo signal and a near-end audio signal, the output is a near-end target audio signal for output to a far-end user, in which linear and nonlinear echo components are suppressed.

Further, in some embodiments of the present invention, with respect to the above processes, the update of the coefficient vector of the FIR adaptive filter (AF1) in the Linear Acoustic Echo Canceller (LAEC)201, the FIR adaptive filter (AF2) in the adaptive zero-space direction-finding shaper (ANF)202, and the FIR adaptive filter (AF1) in the nonlinear acoustic echo canceller (NLAEC)203 may be controlled by the controller.

Specifically, in some embodiments of the invention, the controller may implement control of the update of the coefficient vectors of the FIR adaptive filter (AF3), the FIR adaptive filter (AF2), and the FIR adaptive filter (AF1) by determining whether the current state is the double talk state or the single talk state.

Optionally, in some embodiments of the present invention, the controller may determine whether the single-talk state is currently set by:

extracting the voiceprint characteristic vectors of the first audio signal and the first linear echo component respectively according to the first audio signal and the first linear echo component; calculating the similarity between the voiceprint feature vector of the first audio signal and the voiceprint feature vector of the first linear echo component; if the calculated similarity is larger than a preset threshold value, judging that the mobile phone is currently in a double-talk state, and otherwise, judging that the mobile phone is currently in a single-talk state. The similarity calculation may be performed by using various algorithms, for example, a euclidean distance between the voiceprint feature vector of the first audio signal and the voiceprint feature vector of the first linear echo component may be calculated, and the similarity between the two may be represented by the euclidean distance.

Based on the exemplary structure shown in fig. 2, fig. 4 is a schematic diagram illustrating that a controller is used to control updating of the coefficient vector of the FIR adaptive filter in the echo suppression technical solution provided by some embodiments of the present invention. Wherein fig. 4 further comprises a controller 401 in the configuration shown in fig. 2, the controller 401 being configured to control the updating of the coefficient vectors of AF3 in LAEC 201, AF2 in ANF 202, and AF1 in NLAEC 203.

As shown in FIG. 4, which includes an exemplary configuration of the controller 401, the controller 401 may obtain the first audio signal d₂(n) and a first linear echo component y₃(n); the controller 401 further extracts the first audio signals d respectively₂(n) and a first linear echo component y₃(n) a voiceprint feature vector, further computing the first audio signal d₂(n) the sum of the voiceprint feature vectors and the first linear echo component y₃(n) degree of similarity between voiceprint feature vectors of the first order(ii) a If the calculated similarity is greater than the preset threshold value, the controller 401 determines that the mobile terminal is currently in the dual-talk state, otherwise, the controller 401 determines that the mobile terminal is currently in the single-talk state.

Further, the controller 401, after determining that it is currently in the two-talk state or determining that it is currently in the one-talk state, may feed the determination results into the AF3 in the LAEC 201, the AF2 in the ANF 202, and the AF1 in the NLAEC 203, as indicated by the arrows pointing outward from the controller 401 in fig. 4.

Further, for AF3 in LAEC 201 and AF1 in NLAEC 203, the coefficient vector can be updated when currently in the single talk state and stopped when currently in the double talk state according to the current state fed by controller 401, and the specific updating process can be referred to above.

Whereas for AF2 in ANF 202, since it performs updating of the coefficient vector only in the case where it is currently in the single-talk state and the nonlinear echo component is large, for AF2 in ANF 202, controller 401 is also configured to determine whether the nonlinear echo component is large and feed back the determination result to ANF 202. As mentioned above, the fact that the nonlinear echo component is large means that the energy of the nonlinear echo component is higher than a preset threshold. Specifically, in the echo suppression technical solution provided in the embodiment of the present invention, the energy of the nonlinear echo component higher than the preset threshold may be represented as the first echo suppression signal e₃(n) has an average power greater than a first predetermined threshold, or may be the first linear echo component y₃(n) the average power is greater than a second predetermined threshold.

Further, the controller 401 may be configured to generate the first linear echo component y₃(n) to determine whether the nonlinear echo is large, for example, the determination result is represented by a nonlinear echo size state, according to the following example:

wherein, P_y3(n) is the first linear echo component y₃The average power of (n) can be calculated according to the following formula: p_y3(n)＝α·P_y3(n-1)+(1-α)·{y₃(n)}²α is the smoothing factor constant, 0<α<1, typically in the range of 0.925 to 0.999; threshold (Threshold)_LERepresenting a second preset threshold value.

Alternatively, the controller 401 may also suppress the signal e according to the first echo₃(n) to determine whether the nonlinear echo is large, for example, the determination result is represented by a nonlinear echo size state, according to the following example:

wherein, P_NLE(n) is the first echo suppression signal e₃The average power of (n) can be calculated according to the following formula:where α is the smoothing factor constant, 0<α<1, typically in the range of 0.925 to 0.999; threshold (Threshold)_NLERepresenting a first preset threshold value; (i) the formula is that the first echo suppression signal e is judged to be in the single-talk state₃(n) the average power, and (ii) the first echo suppression signal e when the current dual-talk state is judged to be₃(n) a calculation formula of the average power.

Further, the controller 401 may feed back the determination result to the ANF 202 after the determination is made, and the AF2 in the ANF 202 performs the update of the coefficient vector when the current state is the single talk state and the nonlinear echo is large according to the current state fed by the controller 401 and the size state of the determined nonlinear echo, and stops the update of the coefficient vector when the above condition is not satisfied, and the specific update process may be as described above.

In summary, in the echo suppression method provided in the embodiment of the present invention, for a first audio signal and a second audio signal received by an audio receiver in an audio receiver array, a linear echo component in the first audio signal is suppressed to obtain a first echo suppression signal, a nonlinear echo component in the second audio signal is suppressed according to the first echo suppression signal to obtain a second echo suppression signal, and finally, the linear echo component and a residual nonlinear echo component in the second echo suppression signal are suppressed to implement suppression of linear and nonlinear acoustic echoes in the audio signal.

In addition, compared with the prior art, the technical scheme of echo suppression provided by the embodiment of the invention can effectively suppress linear and nonlinear acoustic echoes, and simultaneously has little damage to near-end voice signals or can achieve the effect of no damage under some ideal conditions.

For example, in the process of applying the echo suppression method provided by some embodiments of the present invention to a short-pitch two-microphone array for echo suppression, a Linear Acoustic Echo Canceller (LAEC) extracts a nonlinear echo from one microphone branch, and the extracted nonlinear echo component is transmitted to an adaptive zero-space direction shaping machine (ANF) in the other microphone branch as a reference signal, so that the zero-space direction of the ANF focuses on suppressing the nonlinear echo; the output of the ANF is processed by a conventional non-linear acoustic echo canceller (NLAEC) to remove its linear echo component while further suppressing the remaining weak non-linear echo component. The echo suppression method provided by some embodiments of the present invention is applied to a scheme of performing echo suppression based on a short pitch dual-microphone array, so that not only can acoustic echoes (including linear and nonlinear echoes) in signals received by microphones be effectively suppressed, but also damage to near-end speech can be reduced.

It should be noted that, in the present application, the technical solution of echo suppression provided by the embodiment of the present invention is mainly described in detail from a time domain perspective, and the specific implementation of the technical solution of echo suppression provided by the embodiment of the present invention includes but is not limited to the time domain implementation forms shown in the present application, for example, the implementation forms may also be frequency domain, sub-band domain, wavelet transform domain, and the like.

Based on the same technical concept, the embodiment of the present invention further provides an echo suppression device, which can be implemented in a software manner, a hardware manner, or a combination manner of software and hardware, and can execute the embodiment of the echo suppression method provided by the above embodiment of the present invention.

Alternatively, the apparatus may be employed in the example structures provided by some embodiments of the invention as shown in FIG. 2, or may be employed in the example structures provided by still other embodiments of the invention as shown in FIG. 4; alternatively, the apparatus may be implemented by the example structures provided by some embodiments of the present invention as illustrated in fig. 2, or may be implemented by the example structures provided by still other embodiments of the present invention as illustrated in fig. 4.

Fig. 5 shows a schematic structural diagram of an echo suppressing device provided by some embodiments of the present invention, as shown in fig. 5, the device includes:

a first echo suppression unit 501, configured to suppress a linear echo component in a first audio signal to obtain a first echo suppression signal;

a second echo suppressing unit 502, configured to suppress a nonlinear echo component in a second audio signal according to the first echo suppressing signal, so as to obtain a second echo suppressing signal; wherein the first audio signal and the second audio signal are acquired by audio receivers in an audio receiver array;

a third echo suppressing unit 503, configured to suppress a linear echo component and a residual nonlinear echo component in the second echo suppressed signal.

Optionally, the first echo suppression unit 501 may specifically include:

the first adaptive filter 5011 is configured to filter the far-end audio signal to obtain a first linear echo component;

the first echo suppression module 5012 is configured to suppress a linear echo component in the first audio signal according to the first linear echo component, so as to obtain a first echo suppression signal.

Further, the first adaptive filter 5011 may also be used to update the coefficient vector when currently in the single-talk state.

In connection with the exemplary structure provided by some embodiments of the invention shown in fig. 2, the first echo suppressing unit 501 may be, specifically, a Linear Acoustic Echo Canceller (LAEC)201 shown in fig. 2; or in combination with the exemplary structure provided by the further embodiments of the present invention shown in fig. 4, the first echo suppressing unit 501 may specifically be the Linear Acoustic Echo Canceller (LAEC)201 shown in fig. 4; therefore, specific characteristics and functions of the first echo suppressing unit 501 can be referred to the description of the Linear Acoustic Echo Canceller (LAEC)201 in the foregoing embodiments, and will not be described herein again.

Optionally, the second echo suppressing unit 502 may specifically include:

the second adaptive filter 5021 is used for filtering the first echo suppression signal to obtain a first nonlinear echo component;

the second echo suppressing module 5022 is configured to suppress the nonlinear echo component in the second audio signal according to the first nonlinear echo component, so as to obtain a second echo suppressed signal.

Further, the second adaptive filter 5021 may be further configured to update the coefficient vector when the current state is a single talk state and the average power of the first echo suppression signal is greater than a first preset threshold, or when the current state is a single talk state and the average power of the first linear echo component is greater than a second preset threshold, where the first linear echo component is obtained by filtering the far-end audio signal with the first adaptive filter.

In connection with the exemplary architecture provided by some embodiments of the invention illustrated in fig. 2, the second echo suppression unit 502 may be specifically a component of the adaptive zero-space-domain direction-shaper (ANF)202 illustrated in fig. 2; or in conjunction with the exemplary architecture provided by the further embodiments of the invention shown in fig. 4, the second echo suppression unit 502 may be specifically a component of the adaptive zero-space-domain direction-shaper (ANF)202 shown in fig. 4; therefore, specific characteristics and functions of the second echo suppressing unit 502 can be referred to the description of the adaptive zero-space-domain-pointing modeler (ANF)202 in the foregoing embodiment, and will not be described herein again.

Optionally, the third echo suppressing unit 503 may specifically include:

a third adaptive filter 5031, configured to filter the far-end audio signal to obtain a second linear echo component;

a third echo suppression module 5032, configured to suppress, according to the second linear echo component, the linear echo component in the second echo suppression signal to obtain a third echo suppression signal;

a nonlinear echo suppressor 5033 configured to suppress a residual nonlinear echo component in the third echo suppressed signal according to the second echo suppressed signal and the third echo suppressed signal.

Further, the third adaptive filter 5031 may be further configured to update the coefficient vector when the current state is the single talk state.

In connection with the exemplary structure provided by some embodiments of the invention shown in fig. 2, the third echo suppression unit 503 may be, in particular, a component in the nonlinear acoustic echo canceller (NLAEC)203 shown in fig. 2; or in combination with the exemplary structures provided by the further embodiments of the present invention shown in fig. 4, the third echo suppressing unit 503 may be a component of the nonlinear acoustic echo canceller (NLAEC)203 shown in fig. 4; therefore, specific characteristics and functions of the third echo suppression unit 503 can be referred to the description of the nonlinear acoustic echo canceller (NLAEC)203 in the foregoing embodiment, and will not be described in detail here.

Further, the echo suppressing device provided in some embodiments of the present invention may further include:

a determining unit 504, configured to extract voiceprint feature vectors of the first audio signal and the first linear echo component according to the first audio signal and the first linear echo component, respectively; and the number of the first and second groups,

for calculating a similarity between a voiceprint feature vector of the first audio signal and a voiceprint feature vector of the first linear echo component; and the number of the first and second groups,

and the device is used for judging that the mobile phone is currently in the double-talk state if the calculated similarity is larger than a preset threshold value, and judging that the mobile phone is currently in the single-talk state if the calculated similarity is not larger than the preset threshold value.

Accordingly, in connection with the exemplary structure provided by some embodiments of the invention shown in fig. 4, the determining unit 504 may be a component in the controller 401 shown in fig. 4; therefore, specific characteristics and functions of the determining unit 504 can be referred to the description of the controller 401 in the foregoing embodiments, and will not be described herein again.

Optionally, the audio receivers in the audio receiver array are arranged in an end-fire array.

For a software implementation, the techniques may be implemented with modules (e.g., procedures, functions, and so on) that perform the functions described herein. The software codes may be stored in memory units and executed by processors. The memory unit may be implemented within the processor or external to the processor.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. An echo suppression method, comprising:

2. The method of claim 1, wherein suppressing a linear echo component in a first audio signal to obtain a first echo suppressed signal comprises:

3. The method of claim 2, further comprising: and if the current state is in the single-talk state, updating the coefficient vector of the first adaptive filter.

4. The method of claim 1, wherein suppressing a non-linear echo component in a second audio signal based on the first echo suppression signal to obtain a second echo suppression signal comprises:

5. The method of claim 4, further comprising: and if the current single-talk state is judged and the average power of the first echo suppression signal is larger than a first preset threshold value, or the current single-talk state is judged and the average power of a first linear echo component is larger than a second preset threshold value, updating the coefficient vector of the second adaptive filter, wherein the first linear echo component is obtained by filtering the far-end audio signal by using the first adaptive filter.

6. The method of claim 1, wherein suppressing linear echo components and residual nonlinear echo components in the second echo suppression signal comprises:

7. The method of claim 6, further comprising: and if the current state is in the single-talk state, updating the coefficient vector of the third adaptive filter.

8. The method of claim 3, 5 or 7, wherein the determination of whether the device is currently in the single talk state is made by:

9. The method of any of claims 1 to 7, wherein the audio receivers in the array of audio receivers are arranged in an end-fire array.

10. An echo suppression device, comprising:

11. The apparatus of claim 10, wherein the first echo suppression unit comprises:

12. The apparatus of claim 11, wherein the first adaptive filter is further configured to: and updating the coefficient vector when the current single-talk state is achieved.

13. The apparatus of claim 10, wherein the second echo suppression unit comprises:

14. The apparatus of claim 13, wherein the second adaptive filter is further configured to: when the current single-talk state is achieved and the average power of the first echo suppression signal is larger than a first preset threshold value, or when the current single-talk state is achieved and the average power of a first linear echo component is larger than a second preset threshold value, updating the coefficient vector, wherein the first linear echo component is obtained by filtering a far-end audio signal through a first adaptive filter.

15. The apparatus of claim 10, wherein the third echo suppression unit comprises:

16. The apparatus of claim 15, wherein the third adaptive filter is further configured to: and updating the coefficient vector when the current single-talk state is achieved.

17. The apparatus of claim 12, 14 or 16, further comprising:

the judging unit is used for respectively extracting the voiceprint feature vectors of the first audio signal and the first linear echo component according to the first audio signal and the first linear echo component; for calculating a similarity between a voiceprint feature vector of the first audio signal and a voiceprint feature vector of the first linear echo component; and if the calculated similarity is larger than a preset threshold value, judging that the mobile phone is currently in a double-talk state, otherwise, judging that the mobile phone is currently in a single-talk state.