CN118262737B

CN118262737B - Method, system and storage medium for separating sound array voice signal from background noise

Info

Publication number: CN118262737B
Application number: CN202410449206.6A
Authority: CN
Inventors: 韩瑜; 鲍彧; 黄克迪; 郭宁宇; 顾恺然
Original assignee: Changzhou Institute of Technology
Current assignee: Changzhou Institute of Technology
Priority date: 2024-04-15
Filing date: 2024-04-15
Publication date: 2024-10-29
Anticipated expiration: 2044-04-15
Also published as: CN118262737A

Abstract

The invention discloses a method, a system and a storage medium for separating a sound array voice signal from background noise, which comprise the following steps: acquiring an acoustic array and calculating a guide vector of the acoustic array; based on the position correlation of the receiving sound array, obtaining a space feature matrix representing the position relation of each array element in the receiving sound array; performing feature decomposition on the spatial feature matrix to obtain a right feature matrix, taking the right feature matrix as a spatial feature observation matrix, and performing feature domain projection observation on the voice signals received by the acoustic array; and establishing an angle dictionary for the voice signals to come to represent the space feature domain projection signals of the acoustic array received signals, and establishing a joint optimization function to obtain the space domain sparse feature matrix and the background clutter signals of the voice signals. The method can effectively improve the purity of the received voice signal and can also obtain the environment information carried in the background noise.

Description

Method, system and storage medium for separating sound array voice signal from background noise

Technical Field

The invention belongs to the technical field of processing of voice signals, and relates to a method, a system and a storage medium for separating an acoustic array voice signal from background noise.

Background

Background noise suppression is of great necessity in speech array signal processing. First, suppressing background noise helps to improve the clarity and quality of speech signals, especially in complex environments, such as conference rooms or traffic scenes. Secondly, the suppression of background noise can obviously improve the accuracy of a voice recognition system and avoid false recognition caused by the noise. In addition, for the voice communication system, the background noise is restrained, so that the communication quality can be improved, and the user experience is enhanced. In complex environments, suppression of background noise helps to accommodate various challenging conditions, ensuring robustness of the speech system. For voice control and interactive application, the system is more sensitive due to suppression of background noise, and man-machine interaction experience is improved. In general, background noise suppression is used in speech array signal processing to optimize speech signal quality, improve system accuracy and adaptability to meet practical demands in different environments. On the other hand, in some contexts, the noise itself may contain cues about changes in the surrounding environment, such as crowd noise, traffic noise, and the like. Thus, by extracting and analyzing background noise, the system is better able to understand the surrounding environment, thereby providing more intelligent services. Finally, taking the diversity of background noise into consideration in practical application, the background noise is extracted, so that the system is more flexibly adapted to different environments. However, the existing background noise extraction algorithm is less researched and cannot effectively extract background noise.

Disclosure of Invention

The invention aims to provide a method, a system and a storage medium for separating an acoustic array voice signal from background noise, which can effectively improve the purity of a received voice signal and can also obtain environment information carried in the background noise.

The technical solution for realizing the purpose of the invention is as follows:

a method of separating a sound array speech signal from background noise, comprising the steps of:

s01: acquiring an acoustic array and calculating a guide vector of the acoustic array;

S02: based on the position correlation of the receiving sound array, obtaining a space feature matrix representing the position relation of each array element in the receiving sound array;

S03: performing feature decomposition on the spatial feature matrix to obtain a right feature matrix, taking the right feature matrix as a spatial feature observation matrix, and performing feature domain projection observation on the voice signals received by the acoustic array;

S04: and establishing an angle dictionary for the voice signals to come to represent the space feature domain projection signals of the acoustic array received signals, and establishing a joint optimization function to obtain the space domain sparse feature matrix and the background clutter signals of the voice signals.

In a preferred embodiment, the steering vector of the acoustic array calculated in step S01 is:

wherein c is sound velocity, θ is signal direction angle, N is number of receiving array elements, d ₀ is array element spacing, and f is voice signal frequency;

The signal received by the nth receiving element is r _n, and in the presence of background noise, r _n is expressed as:

r_n＝β(n)s_n+J_n

Where s _n is the speech signal, J _n is the background noise, and β (n) is the nth element of the steering vector β.

In a preferred technical scheme, the spatial feature matrix in step S02 is:

Wherein, The elements in the matrix are:

wherein Δd _ij is the distance between the ith and jth array elements, and β (i) is the ith element of the steering vector β.

In a preferred technical scheme, the feature decomposition of the spatial feature matrix in step S03 is performed as follows:

G＝ΩΛΦ

wherein omega is a left-hand feature matrix, Λ is a diagonal matrix, elements on the diagonal line of the diagonal matrix are feature values of G, and phi is a right-hand feature matrix;

the projection observation of the characteristic domain obtains a projection signal Y=ΦR=ΦβS+ΦJ of the spatial characteristic domain;

Wherein, R is the total signal received by the array, S is the speech signal received by the array, J is the background signal received by the array, y _n is the characteristic field projection signal of the nth array element, R _n is the received signal of the nth array element, S _n is the speech signal received by the nth array element, J _n is the background signal received by the nth array element, n=1, 2, …, N.

In a preferred technical solution, the angle dictionary for the incoming direction of the voice signal established in the step S04 is:

Wherein, Phi ₁,φ₂,…,φ_Q is the set of search incoming angles, Q is the number of search incoming angles.

In a preferred embodiment, the spatial feature domain projection signal of the acoustic array received signal in step S04 is further expressed as:

Y＝ΦR＝ΦΨSΞ+ΦJ

The xi is a spatial domain sparse feature matrix of the voice signal, and has sparse characteristics.

In a preferred technical solution, the joint optimization function established in the step S04 is:

Wherein, xi ₀ is zero norm of xi, J _* is nuclear norm, η and μ are adjustable super parameters, and pi is the correlation characteristic coefficient of the received signals among the receiving array elements.

The invention also discloses a system for separating the voice signal of the acoustic array from the background noise, which comprises:

the sound array acquisition and calculation module acquires a sound array and calculates a guide vector of the sound array;

The space feature matrix calculation module is used for obtaining a space feature matrix representing the position relation of each array element in the receiving sound array based on the position correlation of the receiving sound array;

The characteristic domain projection observation calculation module is used for carrying out characteristic decomposition on the space characteristic matrix to obtain a right characteristic matrix, taking the right characteristic matrix as a space characteristic observation matrix and carrying out characteristic domain projection observation on the voice signals received by the acoustic array;

The separation module establishes an angle dictionary to which the voice signals are directed to represent the space feature domain projection signals of the voice array received signals, and establishes a joint optimization function to obtain the space domain sparse feature matrix and the background clutter signals of the voice signals.

In a preferred technical solution, the guiding vector of the calculated acoustic array in the acoustic array acquisition calculation module is:

r_n＝β(n)s_n+J_n

The invention also discloses a computer storage medium, on which a computer program is stored, which when executed realizes the above-mentioned method for separating the acoustic array voice signal from the background noise.

Compared with the prior art, the invention has the remarkable advantages that:

The method can more effectively separate the undistorted voice signal from the background noise by utilizing the voice signal sparse feature decomposition and feature domain projection observation method, and is applicable to different environments. The purity of the received voice signal can be effectively improved, and meanwhile, the environment information carried in the background noise can be obtained. The background noise can be effectively extracted and correspondingly adjusted when necessary, so that stable and efficient voice processing performance can be provided in various complex scenes.

Drawings

FIG. 1 is a flow chart of a method of separating an acoustic array speech signal from background noise in accordance with the present invention;

fig. 2 is a schematic block diagram of a system for separating acoustic array speech signals from background noise in accordance with the present invention.

Detailed Description

The principle of the invention is as follows: the space feature matrix is subjected to feature decomposition to obtain a right feature matrix, the right feature matrix is used as a space feature observation matrix, feature domain projection observation is performed on voice signals received by the acoustic array, and a joint optimization function is established to obtain a space domain sparse feature matrix and a background clutter signal of the voice signals, so that the purity of the received voice signals can be effectively improved, and meanwhile, the environment information carried in the background noise can be obtained. The background noise can be effectively extracted and correspondingly adjusted when necessary, so that stable and efficient voice processing performance can be provided in various complex scenes.

Example 1:

As shown in fig. 1, a method for separating a sound array speech signal from background noise includes the following steps:

In a preferred embodiment, the steering vector of the acoustic array is calculated in step S01 as:

r_n＝β(n)s_n+J_n

In a preferred embodiment, the spatial feature matrix in step S02 is:

Wherein, The elements in the matrix are:

In a preferred embodiment, in step S03, the spatial feature matrix is decomposed into:

G＝ΩΛΦ

In a preferred embodiment, the angle dictionary for the incoming speech signal established in step S04 is:

In a preferred embodiment, the spatial signature domain projection signal of the acoustic array received signal in step S04 is further expressed as:

Y＝ΦR＝ΦΨSΞ+ΦJ

In a preferred embodiment, the joint optimization function established in step S04 is:

Wherein, II ₀ is zero norm of the XI, II phi J I _* is kernel norm, eta and mu are adjustable super parameters, II is the correlation characteristic coefficient of the received signals between the receiving array elements.

In another embodiment, a computer storage medium has a computer program stored thereon, and the computer program when executed implements the above method for separating an acoustic array speech signal from background noise.

The method for separating the acoustic array voice signal from the background noise can be any one of the above methods for separating the acoustic array voice signal from the background noise, and detailed description thereof is omitted herein.

In yet another embodiment, as shown in fig. 2, a system for separating an acoustic array speech signal from background noise, includes:

The acoustic array acquisition and calculation module 10 acquires an acoustic array and calculates a steering vector of the acoustic array;

The space feature matrix calculation module 20 obtains a space feature matrix representing the position relation of each array element in the receiving sound array based on the position correlation of the receiving sound array;

The feature domain projection observation calculation module 30 performs feature decomposition on the spatial feature matrix to obtain a right feature matrix, takes the right feature matrix as a spatial feature observation matrix, and performs feature domain projection observation on the voice signals received by the acoustic array;

The separation module 40 establishes an angle dictionary to which the voice signal is directed to represent the spatial feature domain projection signal of the acoustic array received signal, and establishes a joint optimization function to obtain the spatial domain sparse feature matrix and the background clutter signal of the voice signal.

Specifically, the following description will be given by taking a preferred embodiment as an example of the workflow of the system for separating the acoustic array speech signal from the background noise:

There is a receiving acoustic array with a number of receiving elements of N and a spacing of d ₀, then the receiving steering vector of the receiving acoustic array can be expressed as:

where c is the speed of sound, θ is the signal direction angle, and f is the speech signal frequency.

The signal received by the nth receiving element is r _n, and in the presence of background noise, r _n may be expressed as:

r_n＝β(n)s_n+J_n (2)

Where s _n is the speech signal, J _n is the background noise, and β (i) is the i-th element of the received steering vector β.

The embodiment defines a spatial feature matrix representing the positional relationship of each array element in the receiving acoustic array based on the positional correlation of the receiving acoustic array:

Wherein, The elements in the matrix are:

Wherein Δd _ij is the distance between the ith and jth array elements.

The spatial feature matrix indicating the positional relationship of each array element in the received sound array may be obtained by other methods, and is not limited herein.

Subsequently, the spatial feature matrix G is subjected to feature decomposition g=Ω ΛΦ and a right-hand feature matrix Φ thereof is obtained. The invention takes the right-direction characteristic matrix phi as a space characteristic observation matrix to carry out characteristic domain projection observation on the voice signals received by the acoustic array, namely

Y＝ΦR＝ΦβS+ΦJ (5)

Wherein,

The correlation characteristic coefficients of the received signals among the receiving array elements are as follows:

Wherein σ is a similarity constraint parameter, S _i、S_j is a speech signal received by the ith and jth array elements, and the correlation characteristic coefficient characterizes the correlation of the pure speech signal received by different receiving array elements, and the smaller the pi value is, the higher the correlation is.

Finally, define the angle dictionary to which the speech signal comes:

Wherein phi ₁,φ₂,…,φ_Q is the search incoming angle set, and Q is the number of search incoming angles.

The spatial signature domain projection signal of the acoustic array received signal represented by equation (5) may be further represented as:

Y＝ΦR＝ΦΨSΞ+ΦJ (9)

Wherein, the xi has sparse property.

Therefore, the invention designs the separation process of the acoustic array voice signal and the background noise as a joint optimization problem as follows:

Wherein II ₀ is zero norm of the XI, and the sparse characteristic is represented. And II phi J II _* is a nuclear norm and represents the low-rank characteristic of the low-rank digital video camera. η and μ are adjustable hyper-parameters.

The method can solve the upper expression by using an ADMM algorithm, and the ADMM I alternate direction multiplier method, and can solve the upper expression by using other known algorithms, so that the spatial domain sparse feature matrix (Xi) of the voice signal and the background clutter signal (J) are obtained after the solution, and the joint separation of the voice signal and the background noise is realized.

The method can effectively improve the purity of the received voice signal and can also obtain the environment information carried in the background noise.

The foregoing examples are preferred embodiments of the present invention, but the embodiments of the present invention are not limited to the foregoing examples, and any other changes, modifications, substitutions, combinations, and simplifications that do not depart from the spirit and principles of the present invention should be made therein and are intended to be equivalent substitutes within the scope of the present invention.

Claims

1. A method for separating a speech signal from background noise in an acoustic array, comprising the steps of:

S03: performing feature decomposition on the spatial feature matrix to obtain a right feature matrix, taking the right feature matrix as a spatial feature observation matrix, and performing feature domain projection observation on the voice signals received by the acoustic array; the space feature matrix is subjected to feature decomposition into:

G＝ΩΛΦ

Wherein, R is the total signal received by the array, S is the speech signal received by the array, J is the background signal received by the array, y _n is the characteristic field projection signal of the nth array element, R _n is the received signal of the nth array element, S _n is the speech signal received by the nth array element, J _n is the background signal received by the nth array element, n=1, 2, …, N;

S04: establishing an angle dictionary for the incoming direction of the voice signals to represent the space feature domain projection signals of the voice array received signals, and establishing a joint optimization function to obtain a space domain sparse feature matrix and background clutter signals of the voice signals; the spatial signature domain projection signal of the acoustic array received signal is further represented as:

Y＝ΦR＝ΦΨSΞ+ΦJ

The Xis is a space domain sparse feature matrix of the voice signal and has sparse characteristics;

the established joint optimization function is as follows:

Wherein, xi is zero norm of xi, J _* is nuclear norm, eta and mu are adjustable super parameters, pi is the correlation characteristic coefficient of the received signal between the receiving array elements, and ψ is the angle dictionary.

2. The method of claim 1, wherein the calculating the steering vector of the acoustic array in step S01 is:

r_n＝β(n)s_n+J_n

3. The method of claim 2, wherein the spatial feature matrix in step S02 is:

Wherein, The elements in the matrix are:

4. The method for separating acoustic array speech signals from background noise according to claim 1, wherein the angle dictionary established in step S04 is:

5. A system for separating a sound array speech signal from background noise, comprising:

The characteristic domain projection observation calculation module is used for carrying out characteristic decomposition on the space characteristic matrix to obtain a right characteristic matrix, taking the right characteristic matrix as a space characteristic observation matrix and carrying out characteristic domain projection observation on the voice signals received by the acoustic array; the space feature matrix is subjected to feature decomposition into:

G＝ΩΛΦ

the separation module establishes an angle dictionary to which the voice signals come to represent the space feature domain projection signals of the voice array received signals, and establishes a joint optimization function to obtain a space domain sparse feature matrix and a background clutter signal of the voice signals; the spatial signature domain projection signal of the acoustic array received signal is further represented as:

Y＝ΦR＝ΦΨSΞ+ΦJ

the established joint optimization function is as follows:

6. The system for separating a speech signal from background noise according to claim 5, wherein the acoustic array acquisition calculation module calculates a steering vector of the acoustic array as:

r_n＝β(n)s_n+J_n

7. A computer storage medium having stored thereon a computer program, which when executed implements the method of separating an acoustic array speech signal from background noise of any of claims 1-4.