CN104599679A - Speech signal based focus covariance matrix construction method and device - Google Patents
Speech signal based focus covariance matrix construction method and device Download PDFInfo
- Publication number
- CN104599679A CN104599679A CN201510052368.7A CN201510052368A CN104599679A CN 104599679 A CN104599679 A CN 104599679A CN 201510052368 A CN201510052368 A CN 201510052368A CN 104599679 A CN104599679 A CN 104599679A
- Authority
- CN
- China
- Prior art keywords
- mrow
- matrix
- covariance matrix
- msub
- sampling frequency
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 239000011159 matrix material Substances 0.000 title claims abstract description 413
- 238000010276 construction Methods 0.000 title abstract 4
- 238000005070 sampling Methods 0.000 claims abstract description 136
- 230000009466 transformation Effects 0.000 claims abstract description 70
- 238000000034 method Methods 0.000 claims abstract description 48
- 238000004364 calculation method Methods 0.000 claims description 7
- 238000012545 processing Methods 0.000 description 20
- 238000001514 detection method Methods 0.000 description 15
- 238000010586 diagram Methods 0.000 description 14
- 238000009432 framing Methods 0.000 description 9
- 230000006870 function Effects 0.000 description 6
- 238000004891 communication Methods 0.000 description 5
- 238000004590 computer program Methods 0.000 description 5
- 238000012986 modification Methods 0.000 description 5
- 230000004048 modification Effects 0.000 description 5
- 230000008569 process Effects 0.000 description 5
- 238000002474 experimental method Methods 0.000 description 2
- 230000037433 frameshift Effects 0.000 description 2
- 238000012952 Resampling Methods 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000001427 coherent effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Circuit For Audible Band Transducer (AREA)
- Obtaining Desirable Characteristics In Audible-Bandwidth Transducers (AREA)
- Measurement Of Velocity Or Position Using Acoustic Or Ultrasonic Waves (AREA)
Abstract
The invention discloses a speech signal based focus covariance matrix construction method and device. The method includes the steps of determining sampling frequency points of a microphone array collecting a voice signal; calculating a first covariance matrix and a focus transformation matrix of the voice signal acquired at any sampling frequency point as well as a conjugate transpose matrix of the focus transformation matrix according to any of the determined sampling frequency points, and setting a product of the first covariance matrix, the focus transformation matrix and the conjugate transpose matrix of the focus transformation matrix as a focus covariance matrix of the voice signal acquired at any sampling frequency point; setting a sum of the calculated focus covariance matrixes of the voice signal acquired at every sampling frequency point as a focus covariance matrix of the voice signal. According to the speech signal based focus covariance matrix construction method and device, the prediction of the incidence angle of a sound source is avoided during the construction of the focus covariance matrix and errors exist in the prediction of the incidence angle of the sound source, so that the accuracy of the constructed focus covariance matrix is improved.
Description
Technical Field
The invention relates to the technical field of voice signal processing, in particular to a method and a device for constructing a focus covariance matrix based on voice signals.
Background
Compared with a single microphone, the microphone array can utilize time domain and frequency domain information of a sound source and also can utilize spatial information of the sound source, so that the microphone array has the advantages of strong anti-interference capability, flexible application and the like, has stronger advantages in the aspects of solving the problems of sound source positioning, speech enhancement, speech recognition and the like, and is widely applied to the fields of audio and video conference systems, vehicle-mounted systems, hearing-aid devices, human-computer interaction systems, robot systems, security monitoring, military reconnaissance and the like.
In the speech processing technology based on the microphone array, the number of sound sources is often required to be known, so that high processing performance can be obtained; if the number of sound sources is unknown, or if the number of sound sources is assumed to be too large or too small, the accuracy of the processing result of the speech acquired by the microphone array is degraded.
In order to improve the accuracy of the processing result of the speech acquired by the microphone array, a method for calculating a sound source is provided, and a focus covariance matrix needs to be constructed in the process of calculating the sound source, but at present, in the process of constructing the focus covariance matrix, the incident angle of the sound source needs to be predicted, then the focus covariance matrix is constructed according to the predicted incident angle, and the number of the sound sources is estimated, but if the predicted incident angle error of the sound source is large, the accuracy of the constructed focus covariance matrix is low.
Disclosure of Invention
The embodiment of the invention provides a method and a device for constructing a focusing covariance matrix based on a voice signal, which are used for solving the defect of low accuracy of the constructed focusing covariance matrix in the prior art.
The embodiment of the invention provides the following specific technical scheme:
in a first aspect, a method for constructing a focus covariance matrix based on a speech signal is provided, which includes:
determining sampling frequency points adopted when a microphone array collects voice signals;
aiming at any one of the determined sampling frequency points, calculating a first covariance matrix, a focusing transformation matrix and a conjugate transpose matrix of the focusing transformation matrix of the voice signal acquired at the any one sampling frequency point, and taking the product of the first covariance matrix, the focusing transformation matrix and the conjugate transpose matrix of the focusing transformation matrix as the focusing covariance matrix of the voice signal acquired at the any one sampling frequency point;
and taking the sum of the focus covariance matrixes of the voice signals acquired at the sampling frequency points as the focus covariance matrix of the voice signals acquired by the microphone array.
With reference to the first aspect, in a first possible implementation manner, the calculating the first covariance matrix specifically includes:
calculating the first covariance matrix as follows:
wherein, theRepresenting the first covariance matrix, k representing the arbitrary sampling frequency point, P representing the number of frames of the microphone array collecting the speech signal, and Xi(k) Discrete Fourier Transform (DFT) values representing the time of any frame and any sampling frequency point of the microphone array, and the microphone arrayRepresents said Xi(k) The N represents the number of sampling frequency points included in any one frame, and the number of sampling frequency points included in any two different frames is the same.
With reference to the first aspect and the first possible implementation manner of the first aspect, in a second possible implementation manner, before the calculating the focus transformation matrix, the method further includes:
determining a focusing frequency point of sampling frequency points adopted when the microphone array collects voice signals;
calculating a second covariance matrix of the voice signals collected by the microphone array at the focusing frequency point;
calculating the focus transformation matrix specifically comprises:
decomposing an eigenvalue of the first covariance matrix to obtain a first eigenvector matrix, and performing conjugate transpose on the first eigenvector matrix to obtain a conjugate transpose matrix of the first eigenvector matrix;
decomposing the eigenvalue of the second covariance matrix to obtain a second eigenvector matrix;
and taking the product of the conjugate transpose matrix of the first eigenvector matrix and the second eigenvector matrix as the focusing transformation matrix.
With reference to the second possible implementation manner of the first aspect, in a third possible implementation manner, the calculating the second covariance matrix specifically includes:
calculating the second covariance matrix as follows:
wherein, theRepresenting the second covariance matrix, the k0Representing the focused frequency point, P representing the number of frames of the speech signal collected by the microphone array, Xi(k0) The DFT value of the microphone array at any frame and the focusing frequency point is represented, and the DFT value is representedRepresents said Xi(k0) The conjugate transpose matrix of (2).
With reference to the second or third possible implementation manner of the first aspect, in a fourth possible implementation manner, decomposing the eigenvalue of the first covariance matrix specifically includes:
decomposing eigenvalues for the first covariance matrix in the following manner:
wherein, theRepresenting the second covariance matrix, the U (k) representing theThe Λ represents the second eigenvector matrix ofThe characteristic values of the U are arranged from big to small to form a diagonal matrix, and the UH(k) Represents the conjugate transpose of U (k).
With reference to the second to fourth possible implementation manners of the first aspect, in a fifth possible implementation manner, decomposing the eigenvalue of the second covariance matrix specifically includes:
decomposing eigenvalues for the second covariance matrix in the following manner:
wherein, theRepresents the second covariance matrix, the U (k)0) Represents the aboveThe second eigenvector matrix, the0Represents the aboveThe characteristic values of the U are arranged from big to small to form a diagonal matrix, and the UH(k0) Represents the U (k)0) The conjugate transpose matrix of (2).
With reference to the first to fifth possible implementation manners of the first aspect, in a sixth possible implementation manner, the X isi(k) The form is as follows:
Xi(k)=[Xi1(k),Xi2(k),......,XiL(k)]T,i=0,1,2,......,P-1
wherein: xi1(k) The DFT value and X of the 1 st array element of the microphone array at the ith frame and the kth sampling frequency point are representedi2(k) The DFT value and X of the 2 nd array element of the microphone array at the ith frame and the kth sampling frequency point are representediL(k) The DFT value of the Lth array element of the microphone array in the ith frame and the kth sampling frequency point is represented, and L is the DFT valueThe microphone array comprises a number of array elements.
In a second aspect, an apparatus for constructing a focus covariance matrix based on a speech signal is provided, including:
the determining unit is used for determining sampling frequency points adopted when the microphone array collects voice signals;
the first calculation unit is used for calculating a first covariance matrix, a focus transformation matrix and a conjugate transpose matrix of a voice signal acquired at any one of the determined sampling frequency points aiming at any one of the determined sampling frequency points, and taking the product of the first covariance matrix, the focus transformation matrix and the conjugate transpose matrix of the focus transformation matrix as a focus covariance matrix of the voice signal acquired at any one of the sampling frequency points;
and the second calculation unit is used for taking the sum of the focus covariance matrixes of the voice signals acquired at the sampling frequency points as the focus covariance matrix of the voice signals acquired by the microphone array.
With reference to the second aspect, in a first possible implementation manner, when the first calculating unit calculates the first covariance matrix, specifically:
calculating the first covariance matrix as follows:
wherein, theRepresenting the first covariance matrix, k representing the arbitrary sampling frequency point, P representing the number of frames of the microphone array collecting the speech signal, and Xi(k) Discrete Fourier Transform (DFT) values representing the time of any frame and any sampling frequency point of the microphone array, and the microphone arrayRepresents said Xi(k) The N represents the number of sampling frequency points included in any one frame, and the number of sampling frequency points included in any two different frames is the same.
With reference to the second aspect and the first possible implementation manner of the second aspect, in a second possible implementation manner, the determining unit is further configured to determine a focusing frequency point of a sampling frequency point adopted when the microphone array collects a voice signal;
the first calculation unit is further configured to calculate a second covariance matrix of the voice signals collected by the microphone array at the focused frequency point;
when the first calculating unit calculates the focus transformation matrix, the method specifically includes:
decomposing an eigenvalue of the first covariance matrix to obtain a first eigenvector matrix, and performing conjugate transpose on the first eigenvector matrix to obtain a conjugate transpose matrix of the first eigenvector matrix;
decomposing the eigenvalue of the second covariance matrix to obtain a second eigenvector matrix;
and taking the product of the conjugate transpose matrix of the first eigenvector matrix and the second eigenvector matrix as the focusing transformation matrix.
With reference to the second possible implementation manner of the second aspect, in a third possible implementation manner, when the first calculating unit calculates the second covariance matrix, specifically:
calculating the second covariance matrix as follows:
wherein, theRepresenting the second covariance matrix, the k0Representing the focused frequency point, P representing the number of frames of the speech signal collected by the microphone array, Xi(k0) Representing the microphone array atDFT value of one frame and focusing frequency point, theRepresents said Xi(k0) The conjugate transpose matrix of (2).
With reference to the second or third possible implementation manner of the second aspect, in a fourth possible implementation manner, when the first calculating unit decomposes the eigenvalue of the first covariance matrix, specifically:
decomposing eigenvalues for the first covariance matrix in the following manner:
wherein, theRepresenting the second covariance matrix, the U (k) representing theThe Λ represents the second eigenvector matrix ofThe characteristic values of the U are arranged from big to small to form a diagonal matrix, and the UH(k) Represents the conjugate transpose of U (k).
With reference to the second to fourth possible implementation manners of the second aspect, in a fifth possible implementation manner, when the first calculating unit decomposes the eigenvalue of the second covariance matrix, specifically:
decomposing eigenvalues for the second covariance matrix in the following manner:
wherein, theRepresents the second covariance matrix, the U (k)0) Represents the aboveThe second eigenvector matrix, the0Represents the aboveThe characteristic values of the U are arranged from big to small to form a diagonal matrix, and the UH(k0) Represents the U (k)0) The conjugate transpose matrix of (2).
With reference to the first to fifth possible implementation manners of the second aspect, in a sixth possible implementation manner, X isi(k) The form is as follows:
Xi(k)=[Xi1(k),Xi2(k),......,XiL(k)]T,i=0,1,2,......,P-1
wherein: xi1(k) The DFT value and X of the 1 st array element of the microphone array at the ith frame and the kth sampling frequency point are representedi2(k) The DFT value and X of the 2 nd array element of the microphone array at the ith frame and the kth sampling frequency point are representediL(k) And the DFT value of the Lth array element of the microphone array at the ith frame and the kth sampling frequency point is represented, and L is the number of array elements included by the microphone array.
The invention has the following beneficial effects:
the main idea of constructing the focus covariance matrix based on the voice signal provided by the embodiment of the invention is as follows: determining sampling frequency points adopted when a microphone array collects voice signals; aiming at any one of the determined sampling frequency points, calculating a first covariance matrix, a focusing transformation matrix and a conjugate transpose matrix of the focusing transformation matrix of the voice signals acquired at any one of the sampling frequency points, and taking the product of the first covariance matrix, the focusing transformation matrix and the conjugate transpose matrix of the focusing transformation matrix as a focusing covariance matrix of the voice signals acquired at any one of the sampling frequency points; according to the scheme, the sum of the focus covariance matrixes of the voice signals acquired at each sampling frequency point is used as the focus covariance matrix of the voice signals, and in the scheme, when the focus covariance matrix is constructed, the incident angle of a sound source does not need to be predicted, and an error exists when the incident angle of the sound source is predicted, so that the accuracy of the constructed focus covariance matrix is improved.
Drawings
FIG. 1A is a flow chart of a method for constructing a focus covariance matrix based on a speech signal according to an embodiment of the invention;
FIG. 1B is a diagram illustrating frame shifting according to an embodiment of the present invention;
FIG. 1C is a schematic diagram of a comparison of the CSM-GDE calculated number of sound sources with the number of sound sources calculated according to an embodiment of the present invention;
FIG. 1D is a schematic diagram of another comparison of the CSM-GDE computed number of sound sources with the computed number of sound sources provided by an embodiment of the present invention;
FIG. 2 is an embodiment of constructing a focus covariance matrix based on a speech signal in accordance with an embodiment of the present invention;
FIG. 3A is a schematic structural diagram of an apparatus for constructing a focus covariance matrix based on a speech signal according to an embodiment of the present invention;
fig. 3B is a schematic structural diagram of an apparatus for constructing a focus covariance matrix based on a speech signal according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the letter "/" herein generally indicates that the former and latter associated objects are in an "or" relationship.
The preferred embodiments of the present invention will be described in detail below with reference to the accompanying drawings, it being understood that the preferred embodiments described herein are merely for illustrating and explaining the present invention, and are not intended to limit the present invention, and that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.
Preferred embodiments of the present invention will be described in detail below with reference to the accompanying drawings.
Referring to fig. 1A, in the embodiment of the present invention, a process of constructing a focus covariance matrix based on a speech signal is as follows:
step 100: determining sampling frequency points adopted when a microphone array collects voice signals;
step 110: aiming at any one of the determined sampling frequency points, calculating a first covariance matrix, a focusing transformation matrix and a conjugate transpose matrix of a focusing transformation matrix of the voice signal acquired at any one of the sampling frequency points, and taking the product of the first covariance matrix, the focusing transformation matrix and the conjugate transpose matrix of the focusing transformation matrix as a focusing covariance matrix of the voice signal acquired at any one of the sampling frequency points;
step 120: and taking the sum of the focus covariance matrixes of the voice signals acquired at the sampling frequency points as the focus covariance matrix of the voice signals acquired by the microphone array.
In the embodiment of the present invention, in order to improve the accuracy of the constructed focus covariance matrix, after acquiring a speech signal acquired by a microphone array at any sampling frequency point, before calculating a first covariance matrix, a focus transformation matrix, and a conjugate transpose matrix of the focus transformation matrix of the speech signal acquired at any sampling frequency point, the following operations are further included:
pre-emphasis processing is carried out on the collected voice signals;
at this time, the first covariance matrix, the focus transformation matrix, and the conjugate transpose matrix of the focus transformation matrix of the speech signal collected at any sampling frequency point are calculated, optionally, the following method may be adopted:
pre-emphasis processing is carried out on the voice signals collected at any sampling frequency point;
and calculating a first covariance matrix, a focusing transformation matrix and a conjugate transpose matrix of the focusing transformation matrix of the voice signal after pre-emphasis processing.
In the embodiment of the present invention, optionally, the speech signal may be pre-emphasized in the following manner:
Wherein,in order to pre-emphasize a speech signal acquired at a kth sampling frequency point, x (k) is the speech signal acquired at the kth sampling frequency point, x (k-1) is the speech signal acquired at the kth sampling frequency point, N is the number of sampling frequency points, and a is a pre-emphasis coefficient, optionally, a is 0.9375.
Wherein, optionally, x (k) is in the form shown in formula two:
Xi(k)=[Xi1(k),Xi2(k),......,XiL(k)]Ti-0, 1,2, a
Wherein: xi1(k) 1 st array element of microphone array in ith frame and ithDFT value, X at k sampling frequency pointsi2(k) DFT values … …, X of 2 nd array element of microphone array at ith frame and kth sampling frequency pointiL(k) The method comprises the steps of representing DFT values of an L-th array element of a microphone array at an ith frame and a kth sampling frequency point, wherein L is the number of array elements included by the microphone array, and P represents the number of frames of voice signals collected by the microphone array.
In the embodiment of the present invention, in order to improve the accuracy of the constructed focus covariance matrix, after acquiring a voice signal acquired by a microphone array at any sampling frequency point, before calculating a first covariance matrix, a focus transformation matrix, and a conjugate transpose matrix of the focus transformation matrix of the voice signal acquired at any sampling frequency point, the following operations are further included:
performing framing processing on the collected voice signals;
when the first covariance matrix, the focus transformation matrix, and the conjugate transpose matrix of the focus transformation matrix of the speech signal acquired at any sampling frequency point are calculated, optionally, the following method may be adopted:
performing frame processing on the voice signals collected at any sampling frequency point;
and calculating a first covariance matrix, a focusing transformation matrix and a conjugate transpose matrix of the focusing transformation matrix of the voice signal subjected to framing processing.
In the embodiment of the present invention, when performing framing, framing is performed in an overlapping manner, that is, two frames before and after the framing are overlapped, where the overlapped portion is called frame shift, optionally, the frame shift is selected to be half of the frame length, and the framing is overlapped as shown in fig. 1B.
In the embodiment of the present invention, in order to further improve the accuracy of the constructed focus covariance matrix, after the framing processing is performed on the received speech signal, the windowing processing needs to be performed on the speech signal subjected to the framing processing.
When performing windowing on the speech signal subjected to framing processing, the following method can be adopted:
the speech signal after frame division is multiplied by a Hamming window function w (n). Optionally, the Hamming window function w (n) is shown in formula three:
And k is any sampling frequency point, N represents the number of sampling frequency points included in any frame, and the number of sampling frequency points included in any two different frames is the same.
In practical applications, some of the speech signals collected by the microphone array may be speech signals emitted by a target object, and some of the speech signals may be speech signals emitted by a non-target object, for example: in a meeting, before a speaker speaks, there are some noises, which are voice signals sent by a non-target object, and when the speaker starts to speak, the voice signals collected by the microphone array are voice signals sent by the target object, and the accuracy of a focusing covariance matrix constructed according to the voice signals sent by the target object is high, so in the embodiment of the present invention, after the voice signals collected by the microphone array are obtained, before calculating a first covariance matrix, a focusing transformation matrix, and a conjugate transpose matrix of the focusing transformation matrix of the voice signals collected at any sampling frequency point, the following operations are further included:
calculating the energy value of the voice signal collected at any sampling frequency point and any frame;
determining a frame where the voice signal with the corresponding energy value reaching a preset energy threshold value is located;
when the first covariance matrix, the focus transformation matrix, and the conjugate transpose matrix of the focus transformation matrix of the speech signal acquired at any sampling frequency point are calculated, optionally, the following method may be adopted:
and calculating a first covariance matrix, a focusing transformation matrix and a conjugate transpose matrix of the focusing transformation matrix of the voice signals collected at any sampling frequency point and the determined frame.
In the embodiment of the present invention, there are various ways to calculate the first covariance matrix, and optionally, the following ways may be adopted:
the first covariance matrix is calculated as follows:
Wherein,representing a first covariance matrix, k representing any sampling frequency point, P representing the number of frames of the microphone array for collecting the voice signals, and Xi(k) A DFT (Discrete Fourier Transform) value, which represents the frequency of any frame and any sampling frequency point of the microphone array,Represents Xi(k) The conjugate transpose matrix and the N represent the number of sampling frequency points included in any frame, and the number of the sampling frequency points included in any two different frames is the same.
In the embodiment of the present invention, before calculating the focus transformation matrix, the following operations are further included:
determining a focusing frequency point of sampling frequency points adopted when a microphone array collects voice signals;
calculating a second covariance matrix of the voice signals collected by the microphone array at the focusing frequency point;
at this time, when calculating the focus transformation matrix, optionally, the following manner may be adopted:
decomposing the eigenvalue of the first covariance matrix to obtain a first eigenvector matrix, and performing conjugate transpose on the first eigenvector matrix to obtain a conjugate transpose matrix of the first eigenvector matrix;
decomposing the eigenvalue of the second covariance matrix to obtain a second eigenvector matrix;
and taking the product of the conjugate transpose matrix of the first eigenvector matrix and the second eigenvector matrix as a focusing transformation matrix.
In the embodiment of the present invention, when calculating the second covariance matrix, optionally, the following manner may be adopted:
the second covariance matrix is calculated as follows:
Wherein,representing a second covariance matrix, k0Representing focusing frequency point, P representing number of frames of voice signal collected by microphone array, Xi(k0) Representing DFT value of microphone array at any frame and focusing frequency point,Represents Xi(k0) The conjugate transpose matrix of (2).
In the embodiment of the present invention, when decomposing the eigenvalue of the first covariance matrix, optionally, the following manner may be adopted:
decomposing eigenvalues for the first covariance matrix in the following way:
Wherein,representing a second covariance matrix, U (k) representingIs represented by the second eigenvector matrix, ΛThe characteristic values of the first and second image frames are arranged in descending order to form a diagonal matrix, UH(k) Denotes the conjugate transpose of U (k).
In the embodiment of the present invention, when decomposing the eigenvalue of the second covariance matrix, optionally, the following manner may be adopted:
decomposing the eigenvalues of the second covariance matrix in the following way:
Wherein,represents the second covariance matrix, U (k)0) To representSecond eigenvector matrix, Λ0To representThe characteristic values of the first and second image frames are arranged in descending order to form a diagonal matrix, UH(k0) Represents U (k)0) The conjugate transpose matrix of (2).
In the embodiment of the invention, optionally, Xi(k) The form is shown in formula two. In the embodiment of the present invention, after the focusing covariance matrix is obtained by calculation, the number of sound sources may be calculated according to the obtained focusing covariance matrix, and when the number of sound sources is calculated according to the obtained focusing covariance matrix, optionally, the following manner may be adopted:
and calculating the number of sound sources according to the obtained focusing covariance matrix by adopting a Gerr circle criterion. For example: in an indoor environment, the room size is 10m × 10m × 3m, and the eight vertex coordinates are (0,0,0), (0,10,2.5), (0,0,2.5), (10,0,0), (10,10,2.5), and (10,0,2.5), respectively. The uniform linear array composed of 10 microphones is distributed between two points (2,4,1.3) and (2,4.9,1.3), the spacing between array elements is 0.1m, the array elements are isotropic omnidirectional microphones, the positions of 6 speakers are respectively (8,1,1.3), (8,2.6,1.3), (8,4.2,1.3), (8,5.8,1.3), (8,7.4,1.3) and (8,9,1.3), and the background noise is assumed to be white gaussian noise. And processing the microphone array and the voice of the speaker by using an Image simulation model, sampling a voice signal at the sampling frequency of 8kHz, and acquiring a microphone array receiving signal. The coefficient γ of the folding resampling is 0.8, and the number of iterations is 20. The time length of the voice signal of the speaker is long enough, different data are taken in each experiment to carry out 50 times of tests, and the detection probability is as follows:
(formula eight)
If the number of actual speakers is 2, any frame includes 128 sampling frequency points, the number of frames is 100, the parameter d (k) in the bell circle criterion is 0.7, the Signal-to-noise ratio changes from-5 dB to 5dB, and the step length is 1dB, a comparison of the detection probability with the Signal-to-noise ratio of the Method of the focused covariance matrix constructed by the Method provided by the embodiment of the present invention and the existing CSM (Coherent Signal Subspace Method) -GDE (gerschgorine disk Estimator) Method is shown in fig. 1C. As can be seen from FIG. 1C, the CSM-GDE method has a detection probability of 0.9 at a signal-to-noise ratio of 0dB and a detection probability of 1 at a signal-to-noise ratio of 4 dB. Compared with the CSM-GDE method, the scheme provided by the invention has the advantages that when the signal to noise ratio is less than 0dB, the correct detection probability is greatly improved; when the signal-to-noise ratio is-3 dB, the detection probability reaches 0.9, and when the signal-to-noise ratio is-3 dB, the correct detection probability can reach 1.
If the number of speakers is 2 and the snr is 10dB, any frame includes 128 sampling frequency points, the number of frames changes from 5 to 70, and the step size is 5, the comparison of the detection probability of the focused covariance matrix constructed by the method according to the embodiment of the present invention with the existing CSM-GDE method with the number of frames is shown in fig. 1D. As shown in fig. 1D, the CSM-GDE method can achieve a detection probability of 0.9 for a frame number of 40 and 1 for a frame number of 65. When the number of frames is less than 50, compared with the CSM-GDE method, the detection probability is greatly improved; the detection probability reaches 0.9 when the number of frames is 25, and reaches 1 when the number of frames is 50.
Table 1 shows the performance comparison between the method for calculating the number of sound sources by constructing the focused covariance matrix according to the present invention and the method for calculating the number of sound sources by CSM-GDE under different numbers of speakers. In this experiment, the actual speaker number was 2, the signal-to-noise ratio was 10dB, the subframe length was 128 points, and the number of frames was 100. As can be seen from Table 1, when the actual number of speakers is 2 and 3, the detection probabilities of the method for calculating the number of sound sources by constructing the focusing covariance matrix and the method for calculating the number of sound sources by CSM-GDE provided by the scheme of the present invention can both reach 1, when the actual number of speakers is greater than 3, the detection probabilities gradually decrease as the number of speakers increases, and under the condition that the number of speakers is the same, the method for calculating the number of sound sources by constructing the focusing covariance matrix provided by the scheme of the present invention has a higher detection probability than the method for calculating the number of sound sources by CSM-GDE.
TABLE 1 variation of detection probability with actual speaker number
Actual speaker count | 2 are provided with | 3 are provided with | 4 are provided with | 5 are provided with | 6 are |
CSM-GDE | 1 | 1 | 0.94 | 0.84 | 0.66 |
Scheme of the invention | 1 | 1 | 0.98 | 0.90 | 0.72 |
In the embodiment of the present invention, it is a common method in the art to calculate the number of sound sources according to the obtained focus covariance matrix by using the geuer circle criterion, and detailed description is omitted here.
For better understanding of the embodiment of the present invention, the following provides a specific application scenario, and further details are made regarding the process of constructing a focus covariance matrix based on a speech signal, as shown in fig. 2:
step 200: the sampling frequency points adopted when the microphone array collects the voice signals are determined to be 100: sampling frequency point 0, sampling frequency point 1, sampling frequency points 2 and … …, and sampling frequency point 99;
step 210: calculating a first covariance matrix for the sampling frequency point 0 aiming at the sampling frequency point 0;
step 220: determining a focusing frequency point of 100 sampling frequency points;
step 230: calculating a second covariance matrix of the voice signals collected by the microphone array at the focusing frequency point;
step 240: decomposing the eigenvalue of the first covariance matrix to obtain a first eigenvector matrix, and performing conjugate transpose on the first eigenvector matrix to obtain a conjugate transpose matrix of the first eigenvector matrix;
step 250: decomposing the eigenvalue of the second covariance matrix to obtain a second eigenvector matrix;
step 260: taking the product of the conjugate transpose matrix of the first eigenvector matrix and the second eigenvector matrix as a focusing transformation matrix, and performing conjugate transpose on the focusing transformation matrix to obtain a conjugate transpose matrix of the focusing transformation matrix;
step 270: taking the product of the first covariance matrix, the focusing transformation matrix and the conjugate transpose matrix of the focusing transformation matrix as the focusing covariance matrix of the voice signals collected at the sampling frequency point 0;
step 280: and calculating the focus covariance matrixes of other sampling frequency points in a mode of calculating the focus covariance matrix aiming at the sampling frequency point 0, and taking the sum of the focus covariance matrixes aiming at each sampling frequency point as the focus covariance matrix of the voice signals collected by the microphone array.
Based on the above technical solution of the corresponding method, referring to fig. 3A, an embodiment of the present invention provides an apparatus for constructing a focus covariance matrix based on a speech signal, the apparatus includes a determining unit 30, a first calculating unit 31, and a second calculating unit 32, wherein:
the determining unit 30 is configured to determine sampling frequency points adopted when the microphone array collects voice signals;
the first calculating unit 31 is configured to calculate, for any one of the determined sampling frequency points, a first covariance matrix, a focus transform matrix, and a conjugate transpose matrix of the focus transform matrix of the voice signal acquired at the any one sampling frequency point, and take a product of the first covariance matrix, the focus transform matrix, and the conjugate transpose matrix of the focus transform matrix as a focus covariance matrix of the voice signal acquired at the any one sampling frequency point;
and the second calculating unit 32 is configured to use the calculated sum of the focus covariance matrices of the speech signals acquired at each sampling frequency point as the focus covariance matrix of the speech signals acquired by the microphone array.
Optionally, when the first calculating unit 31 calculates the first covariance matrix, specifically:
the first covariance matrix is calculated as follows:
wherein,representing a first covariance matrix, k representing any sampling frequency point, P representing the number of frames of the microphone array for collecting the voice signals, and Xi(k) The Discrete Fourier Transform (DFT) value and the Discrete Fourier Transform (DFT) value of the microphone array at any frame and any sampling frequency point,Represents Xi(k) The conjugate transpose matrix and the N represent the number of sampling frequency points included in any frame, and the number of the sampling frequency points included in any two different frames is the same.
Further, the determining unit 30 is further configured to determine a focusing frequency point of a sampling frequency point adopted when the microphone array collects the voice signal;
the first calculating unit 31 is further configured to calculate a second covariance matrix of the speech signals collected by the microphone array at the focused frequency point;
when the first calculating unit 31 calculates the focus transformation matrix, it specifically includes:
decomposing the eigenvalue of the first covariance matrix to obtain a first eigenvector matrix, and performing conjugate transpose on the first eigenvector matrix to obtain a conjugate transpose matrix of the first eigenvector matrix;
decomposing the eigenvalue of the second covariance matrix to obtain a second eigenvector matrix;
and taking the product of the conjugate transpose matrix of the first eigenvector matrix and the second eigenvector matrix as a focusing transformation matrix.
Optionally, when the first calculating unit 31 calculates the second covariance matrix, specifically:
the second covariance matrix is calculated as follows:
wherein,representing a second covariance matrix, k0Representing focusing frequency point, P representing number of frames of voice signal collected by microphone array, Xi(k0) Representing DFT value of microphone array at any frame and focusing frequency point,Represents Xi(k0) The conjugate transpose matrix of (2).
Optionally, when the first calculating unit 31 decomposes the eigenvalue of the first covariance matrix, specifically:
decomposing eigenvalues for the first covariance matrix in the following way:
wherein,representing a second covariance matrix, U (k) representingIs represented by the second eigenvector matrix, ΛThe characteristic values of the first and second image frames are arranged in descending order to form a diagonal matrix, UH(k) Denotes the conjugate transpose of U (k).
Optionally, when the first calculating unit 31 decomposes the eigenvalue of the second covariance matrix, specifically:
decomposing the eigenvalues of the second covariance matrix in the following way:
wherein,represents the second covariance matrix, U (k)0) To representSecond eigenvector matrix, Λ0To representThe characteristic values of the first and second image frames are arranged in descending order to form a diagonal matrix, UH(k0) Represents U (k)0) The conjugate transpose matrix of (2).
Optionally, Xi(k) The form is as follows:
Xi(k)=[Xi1(k),Xi2(k),......,XiL(k)]T,i=0,1,2,......,P-1
wherein: xi1(k) The DFT value and X of the 1 st array element of the microphone array at the ith frame and the kth sampling frequency point are representedi2(k) Indication wheatDFT value … …, X of 2 nd array element of wind array at ith frame and kth sampling frequency pointiL(k) And the DFT value of the Lth array element of the microphone array in the ith frame and the kth sampling frequency point is represented, and L is the number of the array elements included by the microphone array.
As shown in fig. 3B, another schematic structural diagram of an apparatus for constructing a focus covariance matrix based on a speech signal according to an embodiment of the present invention includes at least one processor 301, a communication bus 302, a memory 303, and at least one communication interface 304.
The communication bus 302 is used for realizing connection and communication among the above components, and the communication interface 304 is used for connecting and communicating with an external device.
The memory 303 is used for storing executable program codes, and the processor 301 executes the program codes to:
determining sampling frequency points adopted when a microphone array collects voice signals;
aiming at any one of the determined sampling frequency points, calculating a first covariance matrix, a focusing transformation matrix and a conjugate transpose matrix of a focusing transformation matrix of the voice signal acquired at any one of the sampling frequency points, and taking the product of the first covariance matrix, the focusing transformation matrix and the conjugate transpose matrix of the focusing transformation matrix as a focusing covariance matrix of the voice signal acquired at any one of the sampling frequency points;
and taking the sum of the focus covariance matrixes of the voice signals acquired at the sampling frequency points as the focus covariance matrix of the voice signals acquired by the microphone array.
Optionally, when the processor 301 calculates the first covariance matrix, specifically:
the first covariance matrix is calculated as follows:
wherein,representing a first covariance matrix, k representing any sampling frequency point, P representing the number of frames of the microphone array for collecting the voice signals, and Xi(k) The Discrete Fourier Transform (DFT) value and the Discrete Fourier Transform (DFT) value of the microphone array at any frame and any sampling frequency point,Represents Xi(k) The conjugate transpose matrix and the N represent the number of sampling frequency points included in any frame, and the number of the sampling frequency points included in any two different frames is the same.
Further, before the processor 301 calculates the focus transformation matrix, the method further includes:
determining a focusing frequency point of sampling frequency points adopted when a microphone array collects voice signals;
calculating a second covariance matrix of the voice signals collected by the microphone array at the focusing frequency point;
calculating a focus transformation matrix, specifically comprising:
decomposing the eigenvalue of the first covariance matrix to obtain a first eigenvector matrix, and performing conjugate transpose on the first eigenvector matrix to obtain a conjugate transpose matrix of the first eigenvector matrix;
decomposing the eigenvalue of the second covariance matrix to obtain a second eigenvector matrix;
and taking the product of the conjugate transpose matrix of the first eigenvector matrix and the second eigenvector matrix as a focusing transformation matrix.
Optionally, when the processor 301 calculates the second covariance matrix, specifically:
the second covariance matrix is calculated as follows:
wherein,representing a second covariance matrix, k0Indicating focus frequency pointP represents the number of frames of the speech signal collected by the microphone array, Xi(k0) Representing microphone array in arbitrary frame and focusing frequency point
DFT value, Xi H(k0) Represents Xi(k0) The conjugate transpose matrix of (2).
Optionally, when the processor 301 decomposes the eigenvalue of the first covariance matrix, specifically:
decomposing eigenvalues for the first covariance matrix in the following way:
wherein,representing a second covariance matrix, U (k) representingIs represented by the second eigenvector matrix, ΛThe characteristic values of the first and second image frames are arranged in descending order to form a diagonal matrix, UH(k) Denotes the conjugate transpose of U (k).
Optionally, when the processor 301 decomposes the eigenvalue of the second covariance matrix, specifically:
decomposing the eigenvalues of the second covariance matrix in the following way:
wherein,represents the second covariance matrix, U (k)0) To representSecond eigenvector matrix, Λ0To representThe characteristic values of the first and second image frames are arranged in descending order to form a diagonal matrix, UH(k0) Represents U (k)0) The conjugate transpose matrix of (2).
In the embodiment of the invention, optionally, Xi(k) The form is as follows:
Xi(k)=[Xi1(k),Xi2(k),......,XiL(k)]T,i=0,1,2,......,P-1
wherein: xi1(k) The DFT value and X of the 1 st array element of the microphone array at the ith frame and the kth sampling frequency point are representedi2(k) DFT values … …, X of 2 nd array element of microphone array at ith frame and kth sampling frequency pointiL(k) Representing the Lth array element of the microphone array at the ith frame and the kth sampling frequency pointThe DFT value, L, is the number of array elements included in the microphone array.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.
It will be apparent to those skilled in the art that various modifications and variations can be made in the embodiments of the present invention without departing from the spirit or scope of the embodiments of the invention. Thus, if such modifications and variations of the embodiments of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to encompass such modifications and variations.
Claims (14)
1. A method for constructing a focus covariance matrix based on a speech signal, comprising:
determining sampling frequency points adopted when a microphone array collects voice signals;
aiming at any one of the determined sampling frequency points, calculating a first covariance matrix, a focusing transformation matrix and a conjugate transpose matrix of the focusing transformation matrix of the voice signal acquired at the any one sampling frequency point, and taking the product of the first covariance matrix, the focusing transformation matrix and the conjugate transpose matrix of the focusing transformation matrix as the focusing covariance matrix of the voice signal acquired at the any one sampling frequency point;
and taking the sum of the focus covariance matrixes of the voice signals acquired at the sampling frequency points as the focus covariance matrix of the voice signals acquired by the microphone array.
2. The method of claim 1, wherein computing the first covariance matrix specifically comprises:
calculating the first covariance matrix as follows:
wherein, theRepresenting the first covariance matrix, k representing the arbitrary sampling frequency point, P representing the number of frames of the microphone array collecting the speech signal, and Xi(k) Representing the microphone array in any frame and the randomDiscrete Fourier Transform (DFT) value in case of sampling frequency point intentionally, method for generating DFT value, and DFT valueRepresents said Xi(k) The N represents the number of sampling frequency points included in any one frame, and the number of sampling frequency points included in any two different frames is the same.
3. The method of claim 1 or 2, wherein prior to computing the focus transform matrix, further comprising:
determining a focusing frequency point of sampling frequency points adopted when the microphone array collects voice signals;
calculating a second covariance matrix of the voice signals collected by the microphone array at the focusing frequency point;
calculating the focus transformation matrix specifically comprises:
decomposing an eigenvalue of the first covariance matrix to obtain a first eigenvector matrix, and performing conjugate transpose on the first eigenvector matrix to obtain a conjugate transpose matrix of the first eigenvector matrix;
decomposing the eigenvalue of the second covariance matrix to obtain a second eigenvector matrix;
and taking the product of the conjugate transpose matrix of the first eigenvector matrix and the second eigenvector matrix as the focusing transformation matrix.
4. The method of claim 3, wherein computing the second covariance matrix specifically comprises:
calculating the second covariance matrix as follows:
wherein, theRepresenting the second covariance matrix, the k0Representing the focused frequency point, P representing the number of frames of the speech signal collected by the microphone array, Xi(k0) The DFT value of the microphone array at any frame and the focusing frequency point is represented, and the DFT value is representedRepresents said Xi(k0) The conjugate transpose matrix of (2).
5. The method of claim 3 or 4, wherein decomposing eigenvalues for the first covariance matrix specifically comprises:
decomposing eigenvalues for the first covariance matrix in the following manner:
wherein, theRepresenting the second covariance matrix, the U (k) representing theThe Λ represents the second eigenvector matrix ofThe characteristic values of the U are arranged from big to small to form a diagonal matrix, and the UH(k) Represents the conjugate transpose of U (k).
6. The method of any one of claims 3-5, wherein decomposing eigenvalues for the second covariance matrix specifically comprises:
decomposing eigenvalues for the second covariance matrix in the following manner:
wherein, theRepresenting the second covariance matrix, the U(k0) Represents the aboveThe second eigenvector matrix, the0Represents the aboveThe characteristic values of the U are arranged from big to small to form a diagonal matrix, and the UH(k0) Represents the U (k)0) The conjugate transpose matrix of (2).
7. The method of any one of claims 2 to 6, wherein X isi(k) The form is as follows:
Xi(k)=[Xi1(k),Xi2(k),......,XiL(k)]T,i=0,1,2,......,P-1
wherein: xi1(k) The DFT value and X of the 1 st array element of the microphone array at the ith frame and the kth sampling frequency point are representedi2(k) The DFT value and X of the 2 nd array element of the microphone array at the ith frame and the kth sampling frequency point are representediL(k) And the DFT value of the Lth array element of the microphone array at the ith frame and the kth sampling frequency point is represented, and L is the number of array elements included by the microphone array.
8. An apparatus for constructing a focus covariance matrix based on a speech signal, comprising:
the determining unit is used for determining sampling frequency points adopted when the microphone array collects voice signals;
the first calculation unit is used for calculating a first covariance matrix, a focus transformation matrix and a conjugate transpose matrix of a voice signal acquired at any one of the determined sampling frequency points aiming at any one of the determined sampling frequency points, and taking the product of the first covariance matrix, the focus transformation matrix and the conjugate transpose matrix of the focus transformation matrix as a focus covariance matrix of the voice signal acquired at any one of the sampling frequency points;
and the second calculation unit is used for taking the sum of the focus covariance matrixes of the voice signals acquired at the sampling frequency points as the focus covariance matrix of the voice signals acquired by the microphone array.
9. The apparatus of claim 8, wherein the first computing unit, when computing the first covariance matrix, is specifically:
calculating the first covariance matrix as follows:
wherein, theRepresenting the first covariance matrix, k representing the arbitrary sampling frequency point, P representing the number of frames of the microphone array collecting the speech signal, and Xi(k) Representing the microphone array in any frame and the anyDiscrete Fourier Transform (DFT) value at a sampling frequency pointRepresents said Xi(k) The N represents the number of sampling frequency points included in any one frame, and the number of sampling frequency points included in any two different frames is the same.
10. The device according to claim 8 or 9, wherein the determining unit is further configured to determine a focusing frequency point of sampling frequency points adopted when the microphone array collects the voice signals;
the first calculation unit is further configured to calculate a second covariance matrix of the voice signals collected by the microphone array at the focused frequency point;
when the first calculating unit calculates the focus transformation matrix, the method specifically includes:
decomposing an eigenvalue of the first covariance matrix to obtain a first eigenvector matrix, and performing conjugate transpose on the first eigenvector matrix to obtain a conjugate transpose matrix of the first eigenvector matrix;
decomposing the eigenvalue of the second covariance matrix to obtain a second eigenvector matrix;
and taking the product of the conjugate transpose matrix of the first eigenvector matrix and the second eigenvector matrix as the focusing transformation matrix.
11. The apparatus according to claim 10, wherein the first calculating unit, when calculating the second covariance matrix, is specifically:
calculating the second covariance matrix as follows:
wherein, theRepresenting the second covariance matrix, the k0Representing the focused frequency point, P representing the number of frames of the speech signal collected by the microphone array, Xi(k0) The DFT value of the microphone array at any frame and the focusing frequency point is represented, and the DFT value is representedRepresents said Xi(k0) The conjugate transpose matrix of (2).
12. The apparatus according to claim 10 or 11, wherein the first computing unit, when decomposing eigenvalues for the first covariance matrix, is specifically:
decomposing eigenvalues for the first covariance matrix in the following manner:
wherein, theRepresenting the second covariance matrix, the U (k) representing theThe Λ represents the second eigenvector matrix ofThe characteristic values of the U are arranged from big to small to form a diagonal matrix, and the UH(k) Represents the conjugate transpose of U (k).
13. The apparatus according to any of claims 10-12, wherein the first computing unit, when decomposing eigenvalues for the second covariance matrix, is specifically:
decomposing eigenvalues for the second covariance matrix in the following manner:
wherein, theRepresents the second covariance matrix, the U (k)0) Represents the aboveThe second eigenvector matrix, the0Represents the aboveThe characteristic values of the U are arranged from big to small to form a diagonal matrix, and the UH(k0) Represents the U (k)0) The conjugate transpose matrix of (2).
14. The apparatus of any one of claims 9-13, wherein X isi(k) The form is as follows:
Xi(k)=[Xi1(k),Xi2(k),......,XiL(k)]T,i=0,1,2,......,P-1
wherein: xi1(k) The DFT value and X of the 1 st array element of the microphone array at the ith frame and the kth sampling frequency point are representedi2(k) The DFT value and X of the 2 nd array element of the microphone array at the ith frame and the kth sampling frequency point are representediL(k) And the DFT value of the Lth array element of the microphone array at the ith frame and the kth sampling frequency point is represented, and L is the number of array elements included by the microphone array.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510052368.7A CN104599679A (en) | 2015-01-30 | 2015-01-30 | Speech signal based focus covariance matrix construction method and device |
PCT/CN2015/082571 WO2016119388A1 (en) | 2015-01-30 | 2015-06-26 | Method and device for constructing focus covariance matrix on the basis of voice signal |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510052368.7A CN104599679A (en) | 2015-01-30 | 2015-01-30 | Speech signal based focus covariance matrix construction method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN104599679A true CN104599679A (en) | 2015-05-06 |
Family
ID=53125412
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510052368.7A Pending CN104599679A (en) | 2015-01-30 | 2015-01-30 | Speech signal based focus covariance matrix construction method and device |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN104599679A (en) |
WO (1) | WO2016119388A1 (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2016119388A1 (en) * | 2015-01-30 | 2016-08-04 | 华为技术有限公司 | Method and device for constructing focus covariance matrix on the basis of voice signal |
CN108538306A (en) * | 2017-12-29 | 2018-09-14 | 北京声智科技有限公司 | Improve the method and device of speech ciphering equipment DOA estimations |
CN110992977A (en) * | 2019-12-03 | 2020-04-10 | 北京声智科技有限公司 | Method and device for extracting target sound source |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110501727B (en) * | 2019-08-13 | 2023-10-20 | 中国航空工业集团公司西安飞行自动控制研究所 | Satellite navigation anti-interference method based on space-frequency adaptive filtering |
CN111696570B (en) * | 2020-08-17 | 2020-11-24 | 北京声智科技有限公司 | Voice signal processing method, device, equipment and storage medium |
CN113409804B (en) * | 2020-12-22 | 2024-08-09 | 声耕智能科技(西安)研究院有限公司 | Multichannel frequency domain voice enhancement algorithm based on variable expansion into generalized subspace |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040220800A1 (en) * | 2003-05-02 | 2004-11-04 | Samsung Electronics Co., Ltd | Microphone array method and system, and speech recognition method and system using the same |
CN102568493A (en) * | 2012-02-24 | 2012-07-11 | 大连理工大学 | Underdetermined blind source separation (UBSS) method based on maximum matrix diagonal rate |
CN102664666A (en) * | 2012-04-09 | 2012-09-12 | 电子科技大学 | Efficient robust self-adapting beam forming method of broadband |
CN104166120A (en) * | 2014-07-04 | 2014-11-26 | 哈尔滨工程大学 | Acoustic vector circular matrix steady broadband MVDR orientation estimation method |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102621527B (en) * | 2012-03-20 | 2014-06-11 | 哈尔滨工程大学 | Broad band coherent source azimuth estimating method based on data reconstruction |
CN104599679A (en) * | 2015-01-30 | 2015-05-06 | 华为技术有限公司 | Speech signal based focus covariance matrix construction method and device |
-
2015
- 2015-01-30 CN CN201510052368.7A patent/CN104599679A/en active Pending
- 2015-06-26 WO PCT/CN2015/082571 patent/WO2016119388A1/en active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040220800A1 (en) * | 2003-05-02 | 2004-11-04 | Samsung Electronics Co., Ltd | Microphone array method and system, and speech recognition method and system using the same |
CN102568493A (en) * | 2012-02-24 | 2012-07-11 | 大连理工大学 | Underdetermined blind source separation (UBSS) method based on maximum matrix diagonal rate |
CN102664666A (en) * | 2012-04-09 | 2012-09-12 | 电子科技大学 | Efficient robust self-adapting beam forming method of broadband |
CN104166120A (en) * | 2014-07-04 | 2014-11-26 | 哈尔滨工程大学 | Acoustic vector circular matrix steady broadband MVDR orientation estimation method |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2016119388A1 (en) * | 2015-01-30 | 2016-08-04 | 华为技术有限公司 | Method and device for constructing focus covariance matrix on the basis of voice signal |
CN108538306A (en) * | 2017-12-29 | 2018-09-14 | 北京声智科技有限公司 | Improve the method and device of speech ciphering equipment DOA estimations |
CN108538306B (en) * | 2017-12-29 | 2020-05-26 | 北京声智科技有限公司 | Method and device for improving DOA estimation of voice equipment |
CN110992977A (en) * | 2019-12-03 | 2020-04-10 | 北京声智科技有限公司 | Method and device for extracting target sound source |
CN110992977B (en) * | 2019-12-03 | 2021-06-22 | 北京声智科技有限公司 | Method and device for extracting target sound source |
Also Published As
Publication number | Publication date |
---|---|
WO2016119388A1 (en) | 2016-08-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10602267B2 (en) | Sound signal processing apparatus and method for enhancing a sound signal | |
JP7011075B2 (en) | Target voice acquisition method and device based on microphone array | |
CN104599679A (en) | Speech signal based focus covariance matrix construction method and device | |
US8223988B2 (en) | Enhanced blind source separation algorithm for highly correlated mixtures | |
CN108417224B (en) | Training and recognition method and system of bidirectional neural network model | |
CN106558315B (en) | Heterogeneous microphone automatic gain calibration method and system | |
US10818302B2 (en) | Audio source separation | |
US9378754B1 (en) | Adaptive spatial classifier for multi-microphone systems | |
CN110610718B (en) | Method and device for extracting expected sound source voice signal | |
US10839820B2 (en) | Voice processing method, apparatus, device and storage medium | |
Zhang et al. | Multi-channel multi-frame ADL-MVDR for target speech separation | |
Niwa et al. | Post-filter design for speech enhancement in various noisy environments | |
Wang et al. | Recurrent deep stacking networks for supervised speech separation | |
CN110164468B (en) | Speech enhancement method and device based on double microphones | |
CN103180752B (en) | For resolving equipment and the method for the fuzziness arriving direction estimation | |
CN104898086A (en) | Sound intensity estimation sound source orientation method applicable for minitype microphone array | |
CN114242104A (en) | Method, device and equipment for voice noise reduction and storage medium | |
CN112997249B (en) | Voice processing method, device, storage medium and electronic equipment | |
Kim et al. | Multi-microphone target signal enhancement using generalized sidelobe canceller controlled by phase error filter | |
US20060256978A1 (en) | Sparse signal mixing model and application to noisy blind source separation | |
US20190355374A1 (en) | Method and apparatus for reducing noise of mixed signal | |
CN117782625A (en) | Vehicle fault acoustic detection method, system, control device and storage medium | |
CN115831145A (en) | Double-microphone speech enhancement method and system | |
CN111048096B (en) | Voice signal processing method and device and terminal | |
Ji et al. | Coherence-Based Dual-Channel Noise Reduction Algorithm in a Complex Noisy Environment. |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20150506 |