WO2018133056A1 - 一种声源定位的方法和装置 - Google Patents
一种声源定位的方法和装置 Download PDFInfo
- Publication number
- WO2018133056A1 WO2018133056A1 PCT/CN2017/072014 CN2017072014W WO2018133056A1 WO 2018133056 A1 WO2018133056 A1 WO 2018133056A1 CN 2017072014 W CN2017072014 W CN 2017072014W WO 2018133056 A1 WO2018133056 A1 WO 2018133056A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- sound source
- audio signal
- signal
- channel
- frequency domain
- Prior art date
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S3/00—Systems employing more than two channels, e.g. quadraphonic
- H04S3/02—Systems employing more than two channels, e.g. quadraphonic of the matrix type, i.e. in which input signals are combined algebraically, e.g. after having been phase shifted with respect to each other
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01S—RADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
- G01S5/00—Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations
- G01S5/18—Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations using ultrasonic, sonic, or infrasonic waves
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01S—RADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
- G01S5/00—Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations
- G01S5/18—Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations using ultrasonic, sonic, or infrasonic waves
- G01S5/30—Determining absolute distances from a plurality of spaced points of known location
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R3/00—Circuits for transducers, loudspeakers or microphones
- H04R3/005—Circuits for transducers, loudspeakers or microphones for combining the signals of two or more microphones
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R3/00—Circuits for transducers, loudspeakers or microphones
- H04R3/04—Circuits for transducers, loudspeakers or microphones for correcting frequency response
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R5/00—Stereophonic arrangements
- H04R5/04—Circuit arrangements, e.g. for selective connection of amplifier inputs/outputs to loudspeakers, for loudspeaker detection, or for adaptation of settings to personal preferences or hearing impairments
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R2201/00—Details of transducers, loudspeakers or microphones covered by H04R1/00 but not provided for in any of its subgroups
- H04R2201/40—Details of arrangements for obtaining desired directional characteristic by combining a number of identical transducers covered by H04R1/40 but not provided for in any of its subgroups
- H04R2201/401—2D or 3D arrays of transducers
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R2430/00—Signal processing covered by H04R, not provided for in its groups
- H04R2430/20—Processing of the output signals of the acoustic transducers of an array for obtaining a desired directivity characteristic
- H04R2430/23—Direction finding using a sum-delay beam-former
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2400/00—Details of stereophonic systems covered by H04S but not provided for in its groups
- H04S2400/03—Aspects of down-mixing multi-channel audio to configurations with lower numbers of playback channels, e.g. 7.1 -> 5.1
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2420/00—Techniques used stereophonic systems covered by H04S but not provided for in its groups
- H04S2420/11—Application of ambisonics in stereophonic audio systems
Definitions
- the present invention relates to the field of sound source localization technology, and in particular, to a sound source localization method and apparatus.
- Sound source localization has been widely studied since the 1970s and 1980s. With the development of science and technology, the pursuit of audio quality in various fields is getting higher and higher.
- the object of audio research is from the original single channel. Gradually transition to stereo, surround, and 3D (3-dimensional) audio. Unlike single-channel audio, multi-channel audio is usually obtained through a microphone array.
- the microphone array sound source localization technology based on Direction of Arrival (DOA) estimation is a research hotspot in many fields, widely used in sonar, video teleconference, artificial intelligence, seismic research, voice tracking and recognition, monitoring devices, etc. aspect.
- DOA Direction of Arrival
- the existing DOA method mainly detects the microphone arrays located in the same plane, and performs eigenvalue decomposition on the covariance matrix of the input frequency domain of the multi-channel audio, and further estimates the sound source according to the feature vector corresponding to the maximum eigenvalue. direction.
- the specific steps are:
- the time-frequency transform can be implemented by using a Discrete Fourier Transform (DFT), a Fast Fourier Transformation (FFT), or a Modified Discrete Cosine Transform (MDCT) technique;
- DFT Discrete Fourier Transform
- FFT Fast Fourier Transformation
- MDCT Modified Discrete Cosine Transform
- n is the label of the audio frame in the audio signal
- k is the label of the frequency point in the frequency domain signal
- k l and k u are respectively covariance matrices Calculated starting frequency and cutoff frequency. Smoothing is:
- the DOA detection is combined with the height information, thereby effectively improving the accuracy of the DOA detection, and solving the problem that the detection result of the existing DOA method is inaccurate.
- a method of sound source localization comprising the following steps:
- Step 1 Obtain an audio signal of an M-channel preset format by using a microphone array located in different planes, where the M is a positive integer;
- Step 2 pre-processing the audio signal of the M-channel preset format, projecting it to the same plane, and obtaining N-channel audio signals, where N is a positive integer, and M ⁇ N;
- Step 3 Perform time-frequency transform on the processed N-channel audio signal to obtain N a frequency domain signal of a road audio signal;
- Step 4 Calculate a covariance matrix of the frequency domain signal, and perform smoothing processing on the covariance matrix
- Step 5 Perform eigenvalue decomposition on the smoothed covariance matrix to obtain N eigenvalues and corresponding eigenvectors;
- Step 6 Estimate the direction of the sound source according to the feature vector corresponding to the maximum eigenvalue, and obtain the sound source orientation parameter.
- the audio signal of the preset format is an Ambisonic A format audio signal, specifically four audio signals (LFU, RFD, LBD, RBU) located in different planes.
- the preprocessing process in the step 2 is:
- the conversion matrix is a height angle, and f( ⁇ ) is a function related to ⁇ .
- the microphone array picks up audio
- the conversion matrix The values of the elements a 11 , a 12 , ..., a 44 of the A are constant and are determined by different sound source scenes.
- the preprocessing process in the step 2 is:
- Step 21 Convert the 4-way Ambisonic A format audio signal to the Ambisonic B format audio signal (W, X, Y, Z) through the conversion matrix A:
- Step 22 Estimating a divergence parameter based on an energy of a Z signal in the B format audio signal
- Step 23 Determine whether the divergence is greater than a set threshold
- time-frequency transform in the step 3 may adopt a discrete Fourier transform (Discrete Fourier Transform, DFT), Fast Fourier Transformation (FFT), or Modified Discrete Cosine Transform (MDCT) implementation.
- DFT discrete Fourier Transform
- FFT Fast Fourier Transformation
- MDCT Modified Discrete Cosine Transform
- the frequency domain signal obtained in the step 3 is divided into several sub-bands
- Step 4 calculates a covariance matrix for each subband and performs smoothing processing
- Step 5 respectively performing eigenvalue decomposition on the covariance matrices of the smoothed sub-bands to obtain N eigenvalues and corresponding eigenvectors of each sub-band covariance matrix;
- the step 6 estimates the sound source direction according to the feature vector corresponding to the maximum feature value for each of the sub-bands, and combines the detection results of the sound source direction of each sub-band to obtain the sound source orientation parameter.
- An apparatus for sound source localization comprising an audio signal acquisition unit of a preset format, a signal preprocessing unit, a time-frequency transform unit, a frequency domain signal processing unit, and a sound source orientation estimating unit, wherein
- the signal acquiring unit of the preset format is configured to acquire an audio signal of a preset format of the M channel by using a microphone array located in different planes, and send the audio signal of the preset format of the M channel to the signal preprocessing unit ;
- the signal pre-processing unit is configured to preprocess the received audio signal of the M-channel preset format, project it onto the same plane, obtain N-channel audio signals, and send the N-channel audio signals to the Time-frequency transform unit
- the time-frequency transform unit is configured to perform time-frequency transform on the received N-channel audio signal to obtain a frequency domain signal of the N-channel audio signal;
- the frequency domain signal processing unit is configured to process the frequency domain signal, calculate a covariance matrix of the frequency domain signal, perform smoothing processing, further perform eigenvalue decomposition on the covariance matrix, and obtain the eigenvalue And a feature vector is sent to the sound source orientation estimating unit;
- the sound source orientation estimating unit is configured to correspond to a maximum feature value according to the feature value
- the eigenvector estimates the direction of the sound source and obtains the azimuth parameters of the sound source.
- the accuracy of DOA detection can be effectively improved, and the DOA detection and the DOA detection accuracy can be adaptively performed on the input multi-channel audio according to the divergence parameter of the energy estimation of the Z signal.
- the error caused by the height information effectively improves the horizontal direction resolution.
- FIG. 1 is a schematic flow chart of a method for positioning a sound source according to a preferred embodiment of the present invention.
- FIG. 2 is a schematic diagram of four audio signals in a preferred embodiment of the present invention.
- FIG. 3 is a schematic flow chart of a method for positioning a sound source in another preferred embodiment of the present invention.
- FIG. 4 is a schematic flow chart of a method for positioning a sound source in another preferred embodiment of the present invention.
- FIG. 5 is a schematic flow chart of a method for positioning a sound source according to another preferred embodiment of the present invention.
- FIG. 6 is a functional unit diagram of a device for positioning a sound source in a preferred embodiment of the present invention.
- an embodiment of the present invention provides a method for sound source localization, where the method includes the following steps:
- Step S100 Obtain an audio signal of an M-channel preset format by using a microphone array located in different planes.
- the audio signal of the M channel preset format may be 4 channels.
- Ambisonic A format audio signals LFU, RFD, LBD, RBU, see Figure 2.
- Step S200 Pre-process the audio signal of the M-channel preset format, and project it onto the same plane to obtain N-channel audio signals.
- the transformation matrix The values of elements a 11 , a 12 , ..., a 34 of A are constant and are determined by different sound source scenes.
- Converting Ambisonic A format audio signals into LRS format audio signals can eliminate errors caused by height information and obtain more accurate detection results.
- the transformation matrix The ⁇ is a height angle
- f( ⁇ ) is a function related to ⁇ .
- the microphone array picks up the audio
- the conversion matrix The values of elements a 11 , a 12 , ..., a 44 of A are constant and are determined by different sound source scenes.
- the four-channel audio detection method can effectively improve the horizontal resolution.
- Step S300 Perform time-frequency transform on the processed N-channel audio signals to obtain a frequency domain signal of the N-channel audio signals.
- the time-frequency transform may be a Discrete Fourier Transform (DFT), a Fast Fourier Transformation (FFT), or a Modified Discrete Cosine Transform (MDCT).
- DFT Discrete Fourier Transform
- FFT Fast Fourier Transformation
- MDCT Modified Discrete Cosine Transform
- Step S400 Calculate a covariance matrix of the frequency domain signal, and perform smoothing processing on the covariance matrix.
- the calculation of the covariance matrix may be set in a specific frequency band, or the covariance matrix of each sub-band may be separately calculated after dividing the entire frequency band into sub-bands.
- n is the label of the audio frame in the audio signal
- k is the label of the frequency point in the frequency domain signal
- k l and k u are the starting frequency and the cutoff frequency of the covariance matrix respectively.
- Step S500 performing eigenvalue decomposition on the smoothed covariance matrix to obtain N eigenvalues and corresponding feature vectors.
- Step S600 estimating a sound source direction according to a feature vector corresponding to the maximum feature value, and obtaining a sound source orientation parameter.
- the specific method for estimating the sound source direction according to the feature vector corresponding to the maximum feature value is:
- the maximum eigenvector and the inner vector of the steering vector are used to search for an index value corresponding to the maximum inner product value, and the index value corresponds to the sound source direction.
- the steering vector is:
- K is the order of the steering vector and is usually determined based on the positioning accuracy.
- the inner product D of the largest eigenvector V and the steering vector P is:
- the frequency domain signal obtained in step S300 may be further divided into a plurality of sub-bands; step S400 calculates a covariance matrix for each sub-band and performs smoothing processing; step S500 respectively Performing eigenvalue decomposition on the covariance matrices of the smoothed sub-bands to obtain N eigenvalues and corresponding eigenvectors of each sub-band covariance matrix; step S600 corresponding to each of the sub-bands according to the maximum eigenvalue The eigenvector estimates the direction of the sound source, and combines the detection results of the sound source directions of each sub-band to obtain the sound source orientation parameter.
- the embodiment of the present invention can also perform adaptive DOA detection on the input 4-channel Ambisonic A format audio signal according to the divergence parameter. Referring to FIG. 5, the specific steps are as follows:
- Step S100 Obtain 4 channels of Ambisonic A format audio signals (LFU, RFD, LBD, RBU) through microphone arrays located in different planes.
- Step S200 Pre-processing the 4-way Ambisonic A format audio signal, projecting it onto the same plane, and obtaining 4-channel B-format audio signals (W, X, Y, Z) of the same plane, and according to the 4-way B format.
- Step S201 converting the 4-way Ambisonic A format signal into Ambisonic B format audio (W, X, Y, Z) through the conversion matrix A:
- Step S202 Estimating a divergence parameter based on an energy of a Z signal in the B format audio signal
- Step S203 Determine whether the divergence is greater than a certain threshold, wherein the threshold is set by an empirical value according to different scenarios.
- the value of the threshold may be [0.3, 0.6];
- Step S300 Perform time-frequency transform on the processed N-channel audio signals to obtain a frequency domain signal of the N-channel audio signals.
- the time-frequency transform can use a discrete Fourier transform (Discrete) Fourier Transform (DFT), Fast Fourier Transformation (FFT), or Modified Discrete Cosine Transform (MDCT) implementation.
- DFT discrete Fourier transform
- FFT Fast Fourier Transformation
- MDCT Modified Discrete Cosine Transform
- Step S400 Calculate a covariance matrix of the frequency domain signal and perform smoothing processing.
- the calculation of the covariance matrix may be set in a specific frequency band, or the covariance matrix of each sub-band may be separately calculated after dividing the entire frequency band into sub-bands.
- Step S500 performing eigenvalue decomposition on the smoothed covariance matrix to obtain N eigenvalues and corresponding feature vectors.
- Step S600 estimating a sound source direction according to a feature vector corresponding to the maximum feature value, and obtaining a sound source orientation parameter.
- the specific method for estimating the sound source direction according to the feature vector corresponding to the maximum eigenvalue is:
- the maximum eigenvector and the inner vector of the steering vector are used to search for an index value corresponding to the maximum inner product value, and the index value corresponds to the sound source direction.
- the divergence parameter can also be used as a reference for the reliability of the DOA result.
- the DOA result is highly reliable; when the divergence parameter is large, the DOA result is more reliable. small.
- the DOA detection is performed on the input multi-channel audio adaptively based on the divergence parameter obtained by the energy estimation of the Z signal, and the accuracy of the azimuth detection can be improved at a lower complexity.
- a sound source localization apparatus includes a preset format audio signal acquisition unit 100, a signal pre-processing unit 200, a time-frequency transform unit 300, a frequency domain signal processing unit 400, and a sound source orientation estimation unit 500.
- the audio signal acquiring unit 100 of the preset format is configured to acquire an audio signal of an M-channel preset format by using a microphone array located in different planes, and send the audio signal of the M-channel preset format to the signal pre-processing unit 200.
- the signal pre-processing unit 200 is configured to preprocess the received audio signal of the M-channel preset format, project it onto the same plane, obtain N-channel audio signals, and send N-channel audio signals. It is sent to the time-frequency transform unit 300.
- the time-frequency transform unit 300 is configured to perform time-frequency transform on the received N-channel audio signals to obtain a frequency domain signal of the N-channel audio signals, and send the frequency domain signals of the N-channel audio signals to the frequency domain signal processing unit. 400.
- the frequency domain signal processing unit 400 is configured to process the frequency domain signal of the N channel audio signal, calculate a covariance matrix of the frequency domain signal, and perform smoothing processing, further perform feature decomposition on the covariance matrix, and obtain the obtained feature value and The feature vector is sent to the sound source orientation estimating unit 500.
- the sound source orientation estimating unit 500 is configured to estimate a sound source direction according to a feature vector corresponding to the largest feature value among the feature values, to obtain a sound source orientation parameter.
- the device disclosed in this embodiment projects the Ambisonic audio signals located on different planes onto the same plane and performs detection, which can effectively improve the accuracy of DOA detection.
Landscapes
- Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Acoustics & Sound (AREA)
- Signal Processing (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Otolaryngology (AREA)
- Radar, Positioning & Navigation (AREA)
- Remote Sensing (AREA)
- Algebra (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Mathematical Physics (AREA)
- Pure & Applied Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Circuit For Audible Band Transducer (AREA)
- Stereophonic System (AREA)
- Measurement Of Velocity Or Position Using Acoustic Or Ultrasonic Waves (AREA)
Abstract
一种声源定位的方法和装置,该方法包括:通过位于不同平面的麦克风阵列获得M路预设格式的音频信号(S100);对M路预设格式的音频信号进行预处理,将其投影到同一平面,得到N路音频信号,其中,M≥N(S200);对处理后的N路音频信号,逐路进行时频变换,得到N路音频信号的频域信号(S300);进一步计算频域信号的协方差矩阵并进行平滑处理(S400);对平滑处理后的协方差矩阵进行特征值分解(S500);根据最大特征值对应的特征向量估计声源方向,得到声源方位参数(S600)。该方法能够结合高度信息进行DOA检测,可以有效提高DOA检测的准确度。
Description
本发明涉及声源定位技术领域,具体涉及一种声源定位的方法和装置。
声源定位早在20世纪七八十年代就已经开始被广泛研究,随着科学技术的发展,各个领域对于音频质量的追求越来越高,音频研究的对象由最初的单路(mono),逐渐过渡到立体声(stereo)、环绕声(surround)以及3D(3-dimensional)音频。不同于单路音频,多路音频通常是通过麦克风阵列得到的。目前基于波达方向(Direction of Arrival,DOA)估计的麦克风阵列声源定位技术是很多领域的研究热点,广泛应用在声呐、视频电话会议、人工智能、地震研究、语音追踪与识别、监控装置等方面。
现有DOA方法主要是对位于同一平面的麦克风阵列进行检测,通过对输入的多路音频的频域信号的协方差矩阵进行特征值分解,进一步根据最大特征值对应的特征向量来估计声源的方向。具体步骤为:
a)获得位于同一平面的多路音频信号;
b)逐路进行时频变换,得到多路音频信号的频域信号,进一步计算特定频段内的协方差矩阵并进行平滑处理。
其中,时频变换可以采用离散傅里叶变换(Discrete Fourier Transform,DFT)、快速傅里叶变换(Fast Fourier Transformation,FFT)、修正离散余弦变换(Modified Discrete Cosine Transform,MDCT)等技术实现;
协方差矩阵计算公式为:
其中,n表示音频信号中音频帧的标号;k表示频域信号中频点的标号;X(n,k)为第n帧中第k个频点值组成的矩阵,具体为X(n,k)=[X1(n,k) X2(n,k) …],Xi,i=1,2,...为音频信号的频域信号;kl和ku分别为协方差矩阵计算的起始频点和截止频点。平滑处理为:
其中,α为平滑因子,且α=0.9。
c)对平滑后的协方差矩阵进行特征值分解,得到特征值和对应的特征向量;
d)根据最大特征值对应的特征向量估计声源方向,得到声源方位参数。
对于包含了高度信息的3D音频,因其麦克风阵列没有位于同一平面,直接采用上述现有DOA方法会忽略高度信息引起的误差,导致DOA检测结果不准确。
发明内容
鉴于现有技术的不足,本发明的目的在于提供一种声源定位的方法和装置。针对通过位于不同平面的麦克风阵列获得的预设格式的音频信号,结合高度信息进行DOA检测,有效提高DOA检测的准确度,用以解决现有DOA方法检测结果不准确的问题。
本发明技术方案如下:
一种声源定位的方法,其中,所述方法包括如下步骤:
步骤1、通过位于不同平面的麦克风阵列获得M路预设格式的音频信号,所述M为正整数;
步骤2、对所述M路预设格式的音频信号进行预处理,将其投影到同一平面,得到N路音频信号,所述N为正整数,且M≥N;
步骤3、对处理后的所述N路音频信号,逐路进行时频变换,得到N
路音频信号的频域信号;
步骤4、计算所述频域信号的协方差矩阵,并对所述协方差矩阵进行平滑处理;
步骤5、对所述平滑处理后的协方差矩阵进行特征值分解,得到N个特征值和对应的特征向量;
步骤6、根据最大特征值对应的特征向量估计声源方向,得到声源方位参数。
进一步地,所述步骤1中的M=4,所述预设格式的音频信号为Ambisonic A格式音频信号,具体为位于不同平面的4路音频信号(LFU、RFD、LBD、RBU)。
进一步地,所述步骤2中的所述预处理具体过程为:
通过转换矩阵A将所述4路AmbisonicA格式音频信号转换为3路(N=3)位于同一平面上的音频信号(L、R、S):
进一步地,所述步骤2中的所述预处理过程为:
通过转换矩阵A将所述4路Ambisonic A格式音频信号转换为4路(N=4)位于同一平面上的音频信号(F、R、B、L):
进一步地,所述步骤2中的所述预处理过程为:
步骤21、通过转换矩阵A将所述4路Ambisonic A格式音频信号转换为Ambisonic B格式音频信号(W、X、Y、Z):
步骤22、基于所述B格式音频信号中的Z信号的能量估计发散度参数;
步骤23、判断发散度是否大于设定的阈值;
步骤24、若是,采用3路(N=3)音频信号(L、R、S)估计声源方向;
若否,采用4路(N=4)音频信号估计声源方向。
进一步地,所述步骤3中的时频变换可以采用离散傅里叶变换
(Discrete Fourier Transform,DFT)、快速傅里叶变换(Fast Fourier Transformation,FFT)、或修正离散余弦变换(Modified Discrete Cosine Transform,MDCT)实现。
进一步地,所述步骤6中的所述估计声源方向具体过程为:
根据所述最大特征向量,用所述最大特征向量和导向矢量内积,搜索所述内积值最大时对应的索引值,所述索引值对应的即为所述声源方向。
进一步地,所述步骤3中将得到的频域信号划分为若干个子带;
所述步骤4针对每一子带分别计算其协方差矩阵并进行平滑处理;
所述步骤5分别对所述平滑处理后的若干子带的协方差矩阵进行特征值分解,得到每一子带协方差矩阵的N个特征值和对应的特征向量;
所述步骤6对所述每一子带根据最大特征值对应的特征向量估计声源方向,并结合各子带声源方向检测结果,得到声源方位参数。
一种声源定位的装置,所述装置包括预设格式的音频信号获取单元、信号预处理单元、时频变换单元、频域信号处理单元和声源方位估计单元,其中,
所述预设格式的信号获取单元,用于通过位于不同平面的麦克风阵列获取M路预设格式的音频信号,并将所述位于M路预设格式的音频信号发送给所述信号预处理单元;
所述信号预处理单元,用于对接收的所述M路预设格式的音频信号进行预处理,将其投影到同一平面,得到N路音频信号,并将所述N路音频信号发送到所述时频变换单元;
所述时频变换单元,用于对接收到的所述N路音频信号逐路进行时频变换,得到所述N路音频信号的频域信号;
所述频域信号处理单元,用于对所述频域信号进行处理,计算所述频域信号的协方差矩阵并进行平滑处理,进一步对上述协方差矩阵进行特征值分解,将得到的特征值和特征向量发送到所述声源方位估计单元;
所述声源方位估计单元,用于根据所述特征值中的最大特征值对应
的特征向量估计声源方向,得到声源方位参数。
本发明方法和装置具有如下优点:
结合高度信息进行DOA检测,可以有效提高DOA检测的准确度,并可以根据Z信号的能量估计得到的发散度参数对输入的多路音频自适应进行DOA检测和判断DOA检测的准确性,能够排除高度信息引起的误差,有效提高水平方向分辨率。
图1是本发明一较佳实施方式中声源定位的方法流程示意图。
图2是本发明一较佳实施方式中四路音频信号的示意图。
图3是本发明另一较佳实施方式中声源定位的方法流程示意图。
图4是本发明另一较佳实施方式中声源定位的方法流程示意图。
图5是本发明另一较佳实施方式中声源定位的方法流程示意图。
图6是本发明一较佳实施方式中声源定位的装置功能单元图。
为了使本技术领域的人员更好地理解本发明中的技术方案,下面将结合本发明实施方式中的附图,对本发明实施方式中的技术方案进行清楚、完整地描述,显然,所描述的实施方式仅仅是本发明一部分实施方式,而不是全部的实施方式。基于本发明中的实施方式,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其它实施方式,都应当属于本发明保护的范围。
请参阅图1,本发明实施方式提供一种声源定位的方法,所述方法包括如下步骤:
步骤S100、通过位于不同平面的麦克风阵列获得M路预设格式的音频信号。
在本实施方式中,所述M路预设格式的音频信号可以是4路
Ambisonic A格式音频信号(LFU、RFD、LBD、RBU),请参阅图2。
步骤S200、对所述M路预设格式的音频信号进行预处理,将其投影到同一平面,得到N路音频信号。
在本实施方式中,请参阅图3,可以通过转换矩阵A将4路Ambisonic A格式音频信号转换为3路(N=3)位于同一平面上的音频信号(L、R、S):
将Ambisonic A格式音频信号转换为LRS格式音频信号,能够排除高度信息引起的误差,得到更准确的检测结果。
在本发明的一个实施方式中,请参阅图4,还可以通过转换矩阵A将4路Ambisonic A格式音频信号转换为4路(N=4)位于同一平面上的音频信号(F、R、B、L):
采用四路音频检测方式,可以有效提高水平方向分辨率。
步骤S300、对处理后的所述N路音频信号,逐路进行时频变换,得到N路音频信号的频域信号。
在本实施方式中,时频变换可以采用离散傅里叶变换(Discrete Fourier Transform,DFT)、快速傅里叶变换(Fast Fourier Transformation,FFT)、或修正离散余弦变换(Modified Discrete Cosine Transform,MDCT)实现。
步骤S400、计算所述频域信号的协方差矩阵,并对上述协方差矩阵进行平滑处理。
在本实施方式中,协方差矩阵的计算,可以设置在特定的频段,也可以在将整个频带划分为子带后,分别计算各个子带的协方差矩阵。
对于特定频段的协方差矩阵计算公式为:
其中,n表示音频信号中音频帧的标号;k表示频域信号中频点的标号;X(n,k)为第n帧中第k个频点值组成的矩阵,具体为X(n,k)=[X1(n,k) X2(n,k) … XN(n,k)],Xi,i=1,2,...,N为音频信号的频域信号;kl和ku分别为协方差矩阵计算的起始频点和截止频点。
平滑处理为:
其中,α为平滑因子,可以设置为固定值,例如α=0.9,也可以根据音频信号的特性自适应选取。
步骤S500、对平滑后处理后的协方差矩阵进行特征值分解,得到N个特征值和对应的特征向量。
步骤S600、根据最大特征值对应的特征向量估计声源方向,得到声源方位参数。
在本实施方式中,可以根据最大特征值对应的特征向量估计声源方向具体方法为:
用最大特征向量和导向矢量内积,搜索内积值最大时对应的索引值,索引值对应的即为所述声源方向。
所述导向矢量为:
其中,K为导向矢量的阶数,通常根据定位精度来确定。
对于3路音频信号,pk,k=1,2,...,K的值由下式确定:
对于4路音频信号,pk,k=1,2,...,K的值由下式确定:
最大特征向量V和导向矢量P内积D为:
在本发明实施方式中,还可以将所述步骤S300中得到的频域信号划分为若干个子带;步骤S400针对每一子带分别计算其协方差矩阵并进行平滑处理;步骤S500分别对所述平滑处理后的若干子带的协方差矩阵进行特征值分解,得到每一子带协方差矩阵的N个特征值和对应的特征向量;步骤S600对所述每一子带根据最大特征值对应的特征向量估计声源方向,并结合各子带声源方向检测结果,得到声源方位参数。
本发明实施方式还可以根据发散度参数对输入的4路Ambisonic A格式音频信号自适应进行DOA检测,请参阅图5,具体步骤如下:
步骤S100、通过位于不同平面的麦克风阵列获得4路Ambisonic A格式音频信号(LFU、RFD、LBD、RBU)。
步骤S200、对4路Ambisonic A格式音频信号进行预处理,将其投影到同一平面,得到同一平面的4路B格式音频信号(W、X、Y、Z),并根据所述4路B格式音频信号,判断是采用3路(N=3)音频(L、R、S)还是4路(N=4)音频估计声源方向。
在本实施方式中,具体预处理步骤如下:
步骤S201、通过转换矩阵A将所述4路Ambisonic A格式信号转换为Ambisonic B格式音频(W、X、Y、Z):
步骤S202、基于所述B格式音频信号中的Z信号的能量估计发散度参数;
步骤S203、判断发散度是否大于一定阈值,其中,所述阈值根据不同的场景由经验值设定。
在本发明实施方式中,所述阈值的取值范围可以为[0.3,0.6];
步骤S204、若是,则采用3路(N=3)音频信号(L、R、S)估计声源方向;
若否,采用4路(N=4)音频信号估计声源方向。
步骤S300、对处理后的所述N路音频信号,逐路进行时频变换,得到N路音频信号的频域信号。
在本实施方式中,时频变换可以采用离散傅里叶变换(Discrete
Fourier Transform,DFT)、快速傅里叶变换(Fast Fourier Transformation,FFT)、或修正离散余弦变换(Modified Discrete Cosine Transform,MDCT)实现。
步骤S400、计算频域信号的协方差矩阵并进行平滑处理。
在本实施方式中,协方差矩阵的计算,可以设置在特定的频段,也可以在将整个频带划分为子带后,分别计算各个子带的协方差矩阵。
步骤S500、对平滑处理后的协方差矩阵进行特征值分解,得到N个特征值和对应的特征向量。
步骤S600、根据最大特征值对应的特征向量估计声源方向,得到声源方位参数。
在本实施方式中,根据最大特征值对应的特征向量估计声源方向具体方法为:
用最大特征向量和导向矢量内积,搜索内积值最大时对应的索引值,索引值对应的即为所述声源方向。
在本实施方式中,发散度参数还可以作为DOA结果可信度的一个参考,当发散度参数较小时,DOA结果可信度较大;当发散度参数较大时,DOA结果可信度较小。
本实施方式基于Z信号的能量估计得到的发散度参数对输入的多路音频自适应进行DOA检测,可以在较低的复杂度下提升方位检测的准确性。
请参阅图6,一种声源定位的装置,包括预设格式的音频信号获取单元100、信号预处理单元200、时频变换单元300、频域信号处理单元400和声源方位估计单元500。
预设格式的音频信号获取单元100,用于通过位于不同平面的麦克风阵列获取M路预设格式的音频信号,并将M路预设格式的音频信号发送给信号预处理单元200。
信号预处理单元200,用于对接收到的M路预设格式的音频信号进行预处理,将其投影到同一平面,得到N路音频信号,并将N路音频信号发
送到时频变换单元300。
时频变换单元300,用于对接收到的N路音频信号逐路进行时频变换,得到N路音频信号的频域信号,并将N路音频信号的频域信号发送到频域信号处理单元400。
频域信号处理单元400,用于对N路音频信号的频域信号进行处理,计算频域信号的协方差矩阵并进行平滑处理,进一步对上述协方差矩阵进行特征分解,将得到的特征值和特征向量发送到声源方位估计单元500。
声源方位估计单元500,用于根据特征值中的最大特征值对应的特征向量估计声源方向,得到声源方位参数。
本实施方式公开的装置将位于不同平面的Ambisonic音频信号投影到同一平面上并进行检测,可以有效提高DOA检测的准确度。
上面对本发明的各种实施方式的描述以描述的目的提供给本领域技术人员。其不旨在是穷举的、或者不旨在将本发明限制于单个公开的实施方式。如上所述,本发明的各种替代和变化对于上述技术所属领域技术人员而言将是显而易见的。因此,虽然已经具体讨论了一些另选的实施方式,但是其它实施方式将是显而易见的,或者本领域技术人员相对容易得出。本发明旨在包括在此已经讨论过的本发明的所有替代、修改、和变化,以及落在上述发明的精神和范围内的其它实施方式。
Claims (10)
- 一种声源定位的方法,其特征在于,所述方法包括如下步骤:步骤1、通过位于不同平面的麦克风阵列获得M路预设格式的音频信号,所述M为正整数;步骤2、对所述M路预设格式的信号进行预处理,将其投影到同一平面,得到N路音频信号,所述N为正整数,且M≥N;步骤3、对处理后的所述N路音频信号,逐路进行时频变换,得到N路音频信号的频域信号;步骤4、计算所述频域信号的协方差矩阵,并对所述协方差矩阵进行平滑处理;步骤5、对所述平滑处理后的协方差矩阵进行特征值分解,得到N个特征值和对应的特征向量;步骤6、根据最大特征值对应的特征向量估计声源方向,得到声源方位参数。
- 如权利要求1所述的声源定位的方法,其特征在于,所述步骤1中的M=4,所述M路预设格式的音频信号为Ambisonic A格式音频信号,具体为位于不同平面的4路音频信号(LFU、RFD、LBD、RBU)。
- 如权利要求1所述的声源定位的方法,其特征在于,所述步骤3中的时频变换可以采用离散傅里叶变换DFT、快速傅里叶变换FFT或修正离散余弦变换MDCT实现。
- 如权利要求1所述的声源定位的方法,其特征在于,所述步骤6中的所述估计声源方向具体过程为:根据所述最大特征向量,用所述最大特征向量和导向矢量内积,搜索所述内积值最大时对应的索引值,所述索引值对应的即为所述声源方向。
- 如权利要求1所述的声源定位的方法,其特征在于,所述步骤3中将得到的频域信号划分为若干个子带;所述步骤4分别计算若干个子带的协方差矩阵并进行平滑处理;所述步骤5分别对所述平滑处理后的若干子带的协方差矩阵进行特征值分解,得到每一子带协方差矩阵的N个特征值和对应的特征向量;所述步骤6对所述每一子带根据最大特征值对应的特征向量估计声源方向,并结合各子带声源方向检测结果,得到声源方位参数。
- 一种声源定位的装置,所述装置包括预设格式的音频信号获取单元、信号预处理单元、时频变换单元、频域信号处理单元和声源方位 估计单元,其特征在于,所述预设格式的音频信号获取单元,用于通过位于不同平面的麦克风阵列获取M路预设格式的音频信号,并将所述M路预设格式的音频信号发送给所述信号预处理单元,所述M为正整数;所述信号预处理单元,用于对接收的所述M路预设格式的音频信号进行预处理,将其投影到同一平面,得到N路音频信号,并将所述N路音频信号发送到所述时频变换单元,所述N为正整数,且M≥N;所述时频变换单元,用于对接收到的所述N路音频信号逐路进行时频变换,得到所述N路音频信号的频域信号;所述频域信号处理单元,用于对所述频域信号进行处理,计算所述频域信号的协方差矩阵并进行平滑处理,进一步对上述协方差矩阵进行特征值分解,将得到的特征值和特征向量发送到所述声源方位估计单元;所述声源方位估计单元,用于根据所述特征值中的最大特征值对应的特征向量估计声源方向,得到声源方位参数。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2017/072014 WO2018133056A1 (zh) | 2017-01-22 | 2017-01-22 | 一种声源定位的方法和装置 |
US16/515,036 US10856094B2 (en) | 2017-01-22 | 2019-07-18 | Method and device for sound source localization |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2017/072014 WO2018133056A1 (zh) | 2017-01-22 | 2017-01-22 | 一种声源定位的方法和装置 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/515,036 Continuation-In-Part US10856094B2 (en) | 2017-01-22 | 2019-07-18 | Method and device for sound source localization |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2018133056A1 true WO2018133056A1 (zh) | 2018-07-26 |
Family
ID=62907503
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2017/072014 WO2018133056A1 (zh) | 2017-01-22 | 2017-01-22 | 一种声源定位的方法和装置 |
Country Status (2)
Country | Link |
---|---|
US (1) | US10856094B2 (zh) |
WO (1) | WO2018133056A1 (zh) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109541572A (zh) * | 2018-11-19 | 2019-03-29 | 西北工业大学 | 一种基于线性环境噪声模型的子空间方位估计方法 |
CN109782245A (zh) * | 2018-12-31 | 2019-05-21 | 深圳市华讯方舟太赫兹科技有限公司 | 波达方向估计方法及装置、雷达、可读存储介质 |
CN109831731A (zh) * | 2019-02-15 | 2019-05-31 | 杭州嘉楠耘智信息科技有限公司 | 音源定向方法及装置和计算机可读存储介质 |
TWI714303B (zh) * | 2019-10-09 | 2020-12-21 | 宇智網通股份有限公司 | 聲源定位方法及聲音系統 |
WO2021093798A1 (zh) * | 2019-11-12 | 2021-05-20 | 乐鑫信息科技(上海)股份有限公司 | 用于选择麦克风阵列的输出波束的方法 |
CN113138363A (zh) * | 2021-04-22 | 2021-07-20 | 苏州臻迪智能科技有限公司 | 一种声源定位方法、装置、存储介质和电子设备 |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112216298B (zh) * | 2019-07-12 | 2024-04-26 | 大众问问(北京)信息科技有限公司 | 双麦克风阵列声源定向方法、装置及设备 |
TWI811685B (zh) * | 2021-05-21 | 2023-08-11 | 瑞軒科技股份有限公司 | 會議室系統及音訊處理方法 |
CN115061087B (zh) * | 2022-05-27 | 2024-05-14 | 上海事凡物联网科技有限公司 | 信号处理方法、doa估计方法及电子设备 |
CN116047413B (zh) * | 2023-03-31 | 2023-06-23 | 长沙东玛克信息科技有限公司 | 一种封闭混响环境下的音频精准定位方法 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1832633A (zh) * | 2005-03-07 | 2006-09-13 | 华为技术有限公司 | 一种声源定位方法 |
CN101272168A (zh) * | 2007-03-23 | 2008-09-24 | 中国科学院声学研究所 | 一种信源数估计方法及其波达方向估计方法 |
CN101957442A (zh) * | 2010-06-04 | 2011-01-26 | 河北工业大学 | 一种声源定位装置 |
CN105828266A (zh) * | 2016-03-11 | 2016-08-03 | 苏州奇梦者网络科技有限公司 | 一种麦克风阵列的信号处理方法与系统 |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8262236B2 (en) * | 2008-06-17 | 2012-09-11 | The Invention Science Fund I, Llc | Systems and methods for transmitting information associated with change of a projection surface |
JP2012150237A (ja) * | 2011-01-18 | 2012-08-09 | Sony Corp | 音信号処理装置、および音信号処理方法、並びにプログラム |
CN106157967A (zh) * | 2015-04-28 | 2016-11-23 | 杜比实验室特许公司 | 脉冲噪声抑制 |
JP6915855B2 (ja) * | 2017-07-05 | 2021-08-04 | 株式会社オーディオテクニカ | 集音装置 |
-
2017
- 2017-01-22 WO PCT/CN2017/072014 patent/WO2018133056A1/zh active Application Filing
-
2019
- 2019-07-18 US US16/515,036 patent/US10856094B2/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1832633A (zh) * | 2005-03-07 | 2006-09-13 | 华为技术有限公司 | 一种声源定位方法 |
CN101272168A (zh) * | 2007-03-23 | 2008-09-24 | 中国科学院声学研究所 | 一种信源数估计方法及其波达方向估计方法 |
CN101957442A (zh) * | 2010-06-04 | 2011-01-26 | 河北工业大学 | 一种声源定位装置 |
CN105828266A (zh) * | 2016-03-11 | 2016-08-03 | 苏州奇梦者网络科技有限公司 | 一种麦克风阵列的信号处理方法与系统 |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109541572A (zh) * | 2018-11-19 | 2019-03-29 | 西北工业大学 | 一种基于线性环境噪声模型的子空间方位估计方法 |
CN109541572B (zh) * | 2018-11-19 | 2022-05-03 | 西北工业大学 | 一种基于线性环境噪声模型的子空间方位估计方法 |
CN109782245A (zh) * | 2018-12-31 | 2019-05-21 | 深圳市华讯方舟太赫兹科技有限公司 | 波达方向估计方法及装置、雷达、可读存储介质 |
CN109782245B (zh) * | 2018-12-31 | 2020-12-25 | 深圳市华讯方舟太赫兹科技有限公司 | 波达方向估计方法及装置、雷达、可读存储介质 |
CN109831731A (zh) * | 2019-02-15 | 2019-05-31 | 杭州嘉楠耘智信息科技有限公司 | 音源定向方法及装置和计算机可读存储介质 |
TWI714303B (zh) * | 2019-10-09 | 2020-12-21 | 宇智網通股份有限公司 | 聲源定位方法及聲音系統 |
WO2021093798A1 (zh) * | 2019-11-12 | 2021-05-20 | 乐鑫信息科技(上海)股份有限公司 | 用于选择麦克风阵列的输出波束的方法 |
CN113138363A (zh) * | 2021-04-22 | 2021-07-20 | 苏州臻迪智能科技有限公司 | 一种声源定位方法、装置、存储介质和电子设备 |
Also Published As
Publication number | Publication date |
---|---|
US20190342688A1 (en) | 2019-11-07 |
US10856094B2 (en) | 2020-12-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2018133056A1 (zh) | 一种声源定位的方法和装置 | |
US10901063B2 (en) | Localization algorithm for sound sources with known statistics | |
Wang et al. | Robust speaker localization guided by deep learning-based time-frequency masking | |
CA2815738C (en) | Apparatus and method for deriving a directional information and computer program product | |
US9578439B2 (en) | Method, system and article of manufacture for processing spatial audio | |
US9247343B2 (en) | Sound direction estimation device, sound processing system, sound direction estimation method, and sound direction estimation program | |
US9984702B2 (en) | Extraction of reverberant sound using microphone arrays | |
US9229086B2 (en) | Sound source localization apparatus and method | |
CN106251877A (zh) | 语音声源方向估计方法及装置 | |
US8693287B2 (en) | Sound direction estimation apparatus and sound direction estimation method | |
CN105301563B (zh) | 一种基于一致聚焦变换最小二乘法的双声源定位方法 | |
US11289109B2 (en) | Systems and methods for audio signal processing using spectral-spatial mask estimation | |
CN110610718B (zh) | 一种提取期望声源语音信号的方法及装置 | |
CN109188362B (zh) | 一种麦克风阵列声源定位信号处理方法 | |
CN102565759A (zh) | 一种基于子带信噪比估计的双耳声源定位方法 | |
JP2008054071A (ja) | 紙擦れ音除去装置 | |
CN104777450B (zh) | 一种两级music麦克风阵列测向方法 | |
CN106816156B (zh) | 一种音频质量增强的方法及装置 | |
CN110890099B (zh) | 声音信号处理方法、装置以及存储介质 | |
CN113093106A (zh) | 一种声源定位方法及系统 | |
CN109001678A (zh) | 一种基于三维麦克风阵列的雷声检测与定位方法 | |
Carabias-Orti et al. | Multi-source localization using a DOA Kernel based spatial covariance model and complex nonnegative matrix factorization | |
Sledevič et al. | An evaluation of hardware-software design for sound source localization based on SoC | |
Shiiki et al. | Omnidirectional sound source tracking based on sequential updating histogram | |
Karthik et al. | Subband Selection for Binaural Speech Source Localization. |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 17893029 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 17893029 Country of ref document: EP Kind code of ref document: A1 |