WO2011042808A1 - Signal separation system and signal separation method - Google Patents
Signal separation system and signal separation method Download PDFInfo
- Publication number
- WO2011042808A1 WO2011042808A1 PCT/IB2010/002660 IB2010002660W WO2011042808A1 WO 2011042808 A1 WO2011042808 A1 WO 2011042808A1 IB 2010002660 W IB2010002660 W IB 2010002660W WO 2011042808 A1 WO2011042808 A1 WO 2011042808A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- signal
- signals
- separated
- internal
- noise
- Prior art date
Links
- 238000000926 separation method Methods 0.000 title claims abstract description 41
- 238000012880 independent component analysis Methods 0.000 claims abstract description 32
- 238000009826 distribution Methods 0.000 claims description 32
- 239000011159 matrix material Substances 0.000 claims description 14
- 238000004364 calculation method Methods 0.000 claims description 10
- 239000000284 extract Substances 0.000 claims description 6
- 238000007476 Maximum Likelihood Methods 0.000 claims description 2
- 238000000034 method Methods 0.000 description 15
- 238000010586 diagram Methods 0.000 description 4
- 238000002372 labelling Methods 0.000 description 4
- 238000006243 chemical reaction Methods 0.000 description 3
- 230000001133 acceleration Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000005236 sound signal Effects 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0272—Voice signal separating
Definitions
- the invention relates to a signal separation system and signal separation method that extract a specific signal in a state where a plurality of signals are mixed in a space and, more particularly, to a technique for solving a permutation problem.
- ICA independent component analysis
- signals S(f, t) are estimated through independent component analysis in the frequency domain by using signals X(f, t).
- the signals X(f, t) are obtained by transforming the observed signals x(t) into signal in the time-frequency domain through a short-time discrete Fourier transform.
- the signals S(f, t) and X(f, t) are respectively obtained by performing a short-time discrete Fourier transform on the original signals s(t) and the observed signals x(t).
- the following mathematical expression (2) is considered to estimate the signals S(f, t) in the time-frequency domain.
- Y(f, t) represents a column vector that has the kth output Y k (f, t) as elements.
- W(f) represents n by n matrix (separating matrix) having Wjj(f) as elements.
- the separating matrix W(f) by which the outputs Yi(f, t) to Y n (f, t) are statistically independent of one another (actually, independence becomes maximum) when time t is varied while the frequency bin f is fixed is calculated.
- statistically independent outputs Yi(f, t) to Y n (f, t) are obtained for all of the frequency bin f on the basis of the thus calculated separating matrix W(f), those are subjected to an inverse Fourier transform to thereby make it possible to obtain separated signals y(t) in the time domain.
- JP-A-2004-145172 describes a method of solving the permutation problem in such a manner that the incoming directions of signals are estimated and then the signals are labeled on the basis of the directional information of each signal.
- sound sources are simple sound sources, so it is not always possible to correctly estimate the incoming directions of the signals. For example, in the case of a diffusive noise, the direction of the noise cannot be identified, and, therefore, wrong labeling occurs.
- WO/2009/113192 and the following non-patent document describe a method in which the joint probability density distribution of each of separated signals is calculated and then the separated signals are divided into voice and noise on the basis of the shape of the joint probability density distribution.
- a signal of which the joint probability density distribution is a non-Gaussian distribution is determined as a specific voice signal
- a signal of which the joint probability density distribution is a Gaussian distribution is determined as a noise signal.
- FIG. 5 is a view that shows a robot 10 having a voice recognition function.
- the robot 10 includes a microphone array 12 formed of a plurality of microphones 11 and a signal separation device 20 that processes observed signals from the microphone array 12.
- An ambient noise S 2 enters the microphone array 12 together with a user voice Si.
- the robot itself becomes a noise generating source. That is, the robot 10 includes a power source 30, such as a motor, so a noise sound S 3 from the power source 30 also enters the microphones 11.
- the observed signals x(t) contain the noise S3 from the power source 30.
- the signal separation device 20 performs independent component analysis on signals that contain the user voice Si(f, t), the ambient noise S 2 (f, t) and the power source noise S 3 (f, t) to calculate statistically independent separated signals Y ⁇ (f, t) to Y n (f, t), and then labels the separated signals Y ⁇ f, t) to Y n (f, t).
- the signal of which the joint probability density distribution is a non-Gaussian distribution is simply determined as a user voice as described above, there is a possibility that wrong labeling occurs. This is because the noise S 3 of the power source 30 also has a non-Gaussian joint probability density distribution having a high kurtosis.
- a first aspect of the invention provides a signal separation system that separates an observed signal in the time domain, which mixedly contains a plurality of signals, into the plurality of signals using independent component analysis, and that extracts a specific voice from the separated signals.
- the signal separation system includes: an external microphone that is oriented outside of the signal separation system; an internal sensor that detects only an internal noise from an internal noise source present inside the signal separation system; a discrete Fourier transform unit that performs a discrete Fourier transform on signals from the external microphone and the internal sensor; an independent component analysis unit that performs independent component analysis on transformed signals that have been subjected to a discrete Fourier transform by the discrete Fourier transform unit so that an internal noise separated signal that contains only the internal noise is extract on the basis of the transformed signal of the signal detected by the internal sensor and external separated signals that are independent of the internal noise separated signal and that do not contain the internal noise, are extracted; and a permutation solving unit that executes permutation solving on the external separated signals to extract the specific voice.
- the permutation solving unit may include a spikedness calculation unit that calculates a spikedness, which is a degree of peakedness of probability density distribution of each of the external separated signals; and a clustering unit that labels the external separated signals as the specific voice or an ambient noise on the basis of the spikedness.
- the spikedness calculation unit may calculate a scale parameter, as the spikedness, of Laplacian distribution when each of the external separated signals is subjected to fitting with Laplacian distribution.
- the clustering unit may determine the external separated signal having the largest spikedness as the specific voice.
- a second aspect of the invention provides a signal separation method.
- the signal separation method separates an observed signal in the time domain, which mixedly contains a plurality of signals and is observed in a system that includes an external microphone that is oriented outside of the system and an internal sensor that detects only an internal noise from an internal noise source present inside the system, into the plurality of signals using independent component analysis, and that extracts a specific voice from the separated signals.
- the signal separation method includes: performing a discrete Fourier transform on signals from the external microphone and the internal sensor; performing independent component analysis on transformed signals that have been subjected to a discrete Fourier transform so that an internal noise separated signal that contains only the internal noise is extracted on the basis of the transformed signal of the signal detected by the internal sensor and external separated signals that are independent of the internal noise separated signal and that do not contain the internal noise are extracted; and executing permutation solving on the external separated signals to extract the specific voice.
- FIG 1 is a view that shows a robot equipped with a signal separation device 200 according to a embodiment of the invention
- FIG 2 is a block diagram of the signal separation device 200
- FIG 3 is a block diagram of a permutation solving unit 340
- FIG 4 is a schematic view of the flow of calculating a spikedness (scale parameter c-i(f)) from each of observed signals xi(t) and x 2 (t);
- FIG 5 is a view that shows a robot 10 having a voice recognition function.
- FIG 1 is a view that shows a robot equipped with a signal separation device according to the first embodiment.
- the robot 100 includes external microphones 110, an internal sensor 120 and a signal separation device 200.
- the external microphones 110 each are a sound collecting microphone provided on the body surface of the robot 100.
- a first external microphone 111 and a second external microphone 112 are provided.
- the external microphones 110 receive a voice Si from a user and a noise S 2 from around the external microphones 110.
- the external microphones 110 also receive a noise S 3 from a power source 30.
- the internal sensor 120 exclusively detects the noise S 3 from the power source 30.
- the internal sensor 120 detects the noise from the power source 30, but the internal sensor 120 does not detect a sound signal (Si or S 2 ) from the outside. It is desirable that the internal sensor 120 is, for example, arranged at a location in proximity to the external microphones 110, such as the back side of the external microphones 110.
- Such a sensor that exclusively detects the noise S 3 from the power source 30 may be, for example, an acceleration sensor or a microphone having high directivity.
- the number of the external microphones 110 and the number of the internal sensors 120 are not limited, and are increased or reduced where necessary.
- the internal sensor 120 may be provided in one-to-one correspondence with each external microphone 110.
- the user voice is denoted by S ⁇ f, t
- the ambient noise is denoted by S 2 (f, t)
- the power source noise is denoted by S 3 (f, t).
- a signal observed by the first external microphone 111 is denoted by X ⁇ f, t
- a signal observed by the second external microphone 112 is denoted by X 2 (f, t)
- a signal observed by the internal sensor 120 is denoted by Ri(f, t).
- the relationship between an original signal and an observed signal may be expressed by the following mathematical expression (3) using an unknown coefficient matrix A(f).
- the first external microphone 111 and the second external microphone 112 receive the user voice Si(f, t), the ambient noise S 2 (f, t) and the power source noise S 3 (f, t), so components (A u (f), A 12 (f), A 13 (f), A 21 (f), A 22 (f) and A 23 (f)) of the coefficient matrix A(f) corresponding to the observed signals Xi(f, t) and X 2 (f, t) are not 0.
- FIG 2 is a block diagram of the signal separation device 200.
- the signal separation device 200 includes an analog/digital (A/D) conversion unit 210, a noise suppressing unit 300 and a voice recognition unit 220.
- the A/D conversion unit 210 converts respective signals input from the external microphones 110 and the internal sensor 120 into digital signals and then outputs the digital signals to the noise suppressing unit 300.
- the noise suppressing unit 300 executes process of suppressing noise contained in the input digital signals.
- the noise suppressing unit 300 includes a short-time discrete Fourier transform unit 310, an independent component analysis unit 320, a gain correction unit 330, a permutation solving unit 340 and an inverse discrete Fourier transform unit 350.
- the short-time discrete Fourier transform unit 310 performs a short-time discrete Fourier transform on pieces of digital data input from the A/D conversion unit 210.
- the independent component analysis unit 320 performs independent component analysis (ICA) on observed signals expressed in the time-frequency domain, obtained by the short-time discrete Fourier transform unit 310, and then calculates a separating matrix for each frequency bin.
- ICA independent component analysis
- the detailed process of independent component analysis is, for example, described in JP -A- 2004-145172.
- the observed signals x ⁇ t), x 2 (t) and ri(t) each are subjected to a short-time discrete Fourier transform, and then the obtained signals (hereinafter, also referred to as “transformed signals”) are denoted by Xi(f, t), X 2 (f, t) and R ⁇ f, t).
- transformed signals Xi(f, t)
- statistically independent separated signals hereinafter, also referred to as "separated signals” Yi(f, t), Y 2 (f, t) and (f, t) are extracted on the basis of the following mathematical expression (4) using the separating matrix W(f). w 2 f) (4)
- the separated signal Qi(f, t) (a examp internal noise separated signal) that is obtained by multiplying the translated signal Ri(f, t) containing only the power source noise S 3 (f, t) by the coefficient W 33 (f) is generated.
- ICA adaptively learns the separating matrix W(f) so that the separated signal (f, t) is independent of the separated signals Y ⁇ f, t) and Y 2 (f, t), so the separated signals Yi(f, t) and Y 2 (f, t) that do not contain the power source noise S 3 (f, t) are extracted (semi-blind signal separation).
- the separated signals Y ⁇ f, t) and Y (f, t) each are components corresponding to other than the power source noise S 3 (f, t), that is, any one of the user voice Si(f, t) and the ambient noise S 2 (f, t).
- the gain correction unit 330 executes gain correction process on a separating matrix W(f) at each frequency calculated by the independent component analysis unit 320.
- the permutation solving unit 340 executes process for solving a permutation problem.
- FIG 3 is a block diagram of the permutation solving unit 340.
- the separated signals Yi(f, t), Y 2 (f, t) and Qi(f, t) that are separated by the independent component analysis unit 320 are already identified as correspondent components other than the power source noise S 3 (f, t), that is, any one of the user voice S ⁇ f, t) and the ambient noise S 2 (f, t).
- the object for permutation is the separated signals Yi(f, t) and Y 2 (f, t).
- the separated signals Yi(f, t) and Y 2 (f, t) are input to the permutation solving unit 340, and the separated signal C (f, t) is directly input to the subsequent inverse discrete Fourier transform unit 350.
- the permutation solving according to the present embodiment utilizes the fact that the probability density distribution of the user voice S ⁇ f, t) has a shape that is spiker than the probability density distribution of the ambient noise S 2 (f, t). Furthermore, in order to estimate the spikedness (degree of the peakedness) of the probability density distribution, the scale parameter c-i(f) of Laplacian distribution is used. Here, when the scale parameter ctj(f) of Laplacian distribution is estimated, the expected value of absolute value of the separated signal Y(f, t) is utilized. Hereinafter, the description will be made sequentially. [0033]
- the permutation solving unit 340 includes a spikedness calculation unit 341 and a clustering determination unit 342.
- the spikedness calculation unit 341 calculates the spikedness of the probability density distribution (degree of peakedness of distribution) of each of the separated signals Yi(f, t) and Y 2 (f, t).
- the scale parameter aj(f) of Laplacian distribution when the separated signal Yi(f, t) is subjected to fitting with Laplacian distribution is used as the spikedness.
- the scale parameter aj(f) may be calculated through the following mathematical expression (5) using a maximum likelihood method.
- the separated signal Yi(f, t) is a complex spectrum, so
- ⁇ means the average of
- FIG 4 is a schematic view that shows the flow of calculating spikednesses (scale parameters ; (f)) from the observed signals x ⁇ t), x 2 (t) and r ⁇ t).
- a voice signal collected by the first external microphone 111 is the observed signal xi(t)
- a voice signal collected by the second external microphone 112 is the observed signal x 2 (t)
- a signal detected by the internal sensor 120 is the observed signal ri(t).
- These observed signals are subjected to a discrete Fourier transform for each frame of a predetermined duration, and the results are transformed signals Xi(f, t), X 2 (f, t) and Ri(f, t).
- the clustering determination unit 342 uses the thus calculated spikednesses (scale parameters o3 ⁇ 4(f k )) to label the separated signals Yi(fk, t) and Y 2 (f k , t), and, where necessary, interchanges the separated signals Yi(f k , t) and Y 2 (f k , t).
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Quality & Reliability (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Circuit For Audible Band Transducer (AREA)
- Soundproofing, Sound Blocking, And Sound Damping (AREA)
Abstract
A signal separation system includes: an external microphone; an internal sensor that detects only an internal noise from an internal noise source present inside the system; a discrete Fourier transform unit (310) that performs a discrete Fourier transform on signals from the external microphone and the internal sensor; an independent component analysis unit (320) that performs independent component analysis on transformed signals that have been subjected to a discrete Fourier transform so that an internal noise separated signal that contains only the internal noise is extracted on the basis of the transformed signal of the signal detected by the internal sensor and external separated signals that are independent of the internal, noise separated signal and that do not contain the internal noise are extracted; and a permutation solving unit (340) that executes permutation solving on the external separated signals to extract a specific voice.
Description
SIGNAL SEPARATION SYSTEM AND SIGNAL SEPARATION METHOD
BACKGROUND OF THE INVENTION 1. Field of the Invention
[0001] The invention relates to a signal separation system and signal separation method that extract a specific signal in a state where a plurality of signals are mixed in a space and, more particularly, to a technique for solving a permutation problem. 2. Description of the Related Art
[0002] There is known an independent component analysis (ICA) that separate and decode a plurality of original signals on the basis of statistical independence when the plurality of original signals are linearly mixed with unknown coefficients (see Japanese Patent Application Publication No. 2004-145172 (JP-A-2004-145172)).
[0003] Where observed signals that are obtained by observing a plurality of original signals (sound sources) s(t) with a plurality of microphones are x(t), the observed si nals x(t) are expressed by the mathematical expression (1).
[0004] In the ICA, signals S(f, t) are estimated through independent component analysis in the frequency domain by using signals X(f, t). The signals X(f, t) are obtained by transforming the observed signals x(t) into signal in the time-frequency domain through a short-time discrete Fourier transform. Here, the signals S(f, t) and X(f, t) are respectively obtained by performing a short-time discrete Fourier transform on the original signals s(t) and the observed signals x(t). The following mathematical expression (2) is considered to estimate the signals S(f, t) in the time-frequency domain. In the mathematical expression (2), Y(f, t) represents a column vector that has the kth output Yk(f, t) as elements. W(f) represents n by n matrix (separating matrix) having Wjj(f) as elements.
[0005] Subsequently, the separating matrix W(f) by which the outputs Yi(f, t) to Yn(f, t) are statistically independent of one another (actually, independence becomes maximum) when time t is varied while the frequency bin f is fixed is calculated. After statistically independent outputs Yi(f, t) to Yn(f, t) are obtained for all of the frequency bin f on the basis of the thus calculated separating matrix W(f), those are subjected to an inverse Fourier transform to thereby make it possible to obtain separated signals y(t) in the time domain.
[0006] However, in the independent component analysis in the time-frequency domain, signal separating process is performed for each frequency bin, and the relationship among the frequency bins is not considered. Therefore, even when separation of signals is successful, there is a possibility that inconsistency of a separated destination occurs among the frequency bins. The inconsistency of a separated
destination indicates a phenomenon that, for example, the signal Yi originates in the signal Si, whereas the signal Yi originates in the signal S2 at frequency bin f = 2, and this is called a problem of permutation.
[0007] JP-A-2004-145172 describes a method of solving the permutation problem in such a manner that the incoming directions of signals are estimated and then the signals are labeled on the basis of the directional information of each signal. However, actually, not all sound sources are simple sound sources, so it is not always possible to correctly estimate the incoming directions of the signals. For example, in the case of a diffusive noise, the direction of the noise cannot be identified, and, therefore, wrong labeling occurs.
[0008] In addition, WO/2009/113192 and the following non-patent document describe a method in which the joint probability density distribution of each of separated signals is calculated and then the separated signals are divided into voice and noise on the basis of the shape of the joint probability density distribution. In this method, for example, a signal of which the joint probability density distribution is a non-Gaussian distribution is determined as a specific voice signal, and a signal of which the joint probability density distribution is a Gaussian distribution is determined as a noise signal. According to the above method, even a noise (diffusive noise) is also accurately labeled, so it is possible to determine the separated destination of a signal with high precision. The above non-patent document is "An Improved permutation solver for blind signal separation based front-ends in robot audition (Jani Even, Hiroshi Saruwatari, Kiyohiro Shikano)", IEEE/RSJ International Conference on Intelligent Robotics and Systems (IROS2008), Nice, France, pp. 2172-2177, September 2008.
[0009] Here, the following case is assumed as an environment in which a signal separation system is actually used. FIG. 5 is a view that shows a robot 10 having a voice recognition function. The robot 10 includes a microphone array 12 formed of a plurality of microphones 11 and a signal separation device 20 that processes observed signals from the microphone array 12. An ambient noise S2 enters the microphone array 12 together with a user voice Si. Furthermore, the robot itself becomes a noise generating source.
That is, the robot 10 includes a power source 30, such as a motor, so a noise sound S3 from the power source 30 also enters the microphones 11.
[0010] Thus, the observed signals x(t) contain the noise S3 from the power source 30. The signal separation device 20 performs independent component analysis on signals that contain the user voice Si(f, t), the ambient noise S2(f, t) and the power source noise S3(f, t) to calculate statistically independent separated signals Y\(f, t) to Yn(f, t), and then labels the separated signals Y^f, t) to Yn(f, t). However, if the signal of which the joint probability density distribution is a non-Gaussian distribution is simply determined as a user voice as described above, there is a possibility that wrong labeling occurs. This is because the noise S3 of the power source 30 also has a non-Gaussian joint probability density distribution having a high kurtosis.
[0011] In this way, when the existing method described in WO/2009/113192 and the above non-patent document is applied to an actual environment, there is a possibility that labeling of a separated signal is wrong. Furthermore, a computation load for calculating a joint probability density distribution is excessively high. Therefore, if the shape of the joint probability density distribution of a power source noise is also calculated in addition to the shapes of the joint probability density distributions for a user voice and an ambient noise, the computation load is excessively high. SUMMARY OF INVENTION
[0012] A first aspect of the invention provides a signal separation system that separates an observed signal in the time domain, which mixedly contains a plurality of signals, into the plurality of signals using independent component analysis, and that extracts a specific voice from the separated signals. The signal separation system includes: an external microphone that is oriented outside of the signal separation system; an internal sensor that detects only an internal noise from an internal noise source present inside the signal separation system; a discrete Fourier transform unit that performs a discrete Fourier transform on signals from the external microphone and the internal sensor; an independent component analysis unit that performs independent component
analysis on transformed signals that have been subjected to a discrete Fourier transform by the discrete Fourier transform unit so that an internal noise separated signal that contains only the internal noise is extract on the basis of the transformed signal of the signal detected by the internal sensor and external separated signals that are independent of the internal noise separated signal and that do not contain the internal noise, are extracted; and a permutation solving unit that executes permutation solving on the external separated signals to extract the specific voice.
[0013] In the first aspect of the invention, the permutation solving unit may include a spikedness calculation unit that calculates a spikedness, which is a degree of peakedness of probability density distribution of each of the external separated signals; and a clustering unit that labels the external separated signals as the specific voice or an ambient noise on the basis of the spikedness. In the above configuration, the spikedness calculation unit may calculate a scale parameter, as the spikedness, of Laplacian distribution when each of the external separated signals is subjected to fitting with Laplacian distribution.
[0014] In the above configuration, the clustering unit may determine the external separated signal having the largest spikedness as the specific voice.
[0015] A second aspect of the invention provides a signal separation method. The signal separation method separates an observed signal in the time domain, which mixedly contains a plurality of signals and is observed in a system that includes an external microphone that is oriented outside of the system and an internal sensor that detects only an internal noise from an internal noise source present inside the system, into the plurality of signals using independent component analysis, and that extracts a specific voice from the separated signals. The signal separation method includes: performing a discrete Fourier transform on signals from the external microphone and the internal sensor; performing independent component analysis on transformed signals that have been subjected to a discrete Fourier transform so that an internal noise separated signal that contains only the internal noise is extracted on the basis of the transformed signal of the signal detected by the internal sensor and external separated signals that are
independent of the internal noise separated signal and that do not contain the internal noise are extracted; and executing permutation solving on the external separated signals to extract the specific voice. BRIEF DESCRIPTION OF DRAWINGS
[0016] The features, advantages, and technical and industrial significance of this invention will be described below with reference to the accompanying drawings, in which like numerals denote like elements, and wherein:
FIG 1 is a view that shows a robot equipped with a signal separation device 200 according to a embodiment of the invention;
FIG 2 is a block diagram of the signal separation device 200;
FIG 3 is a block diagram of a permutation solving unit 340;
FIG 4 is a schematic view of the flow of calculating a spikedness (scale parameter c-i(f)) from each of observed signals xi(t) and x2(t); and
FIG 5 is a view that shows a robot 10 having a voice recognition function.
DETAILED DESCRIPTION OF EMBODIMENTS
[0017] Embodiments of the invention are illustrated in the drawings, and will be described with reference to the reference numerals assigned to components in the drawings. FIG 1 is a view that shows a robot equipped with a signal separation device according to the first embodiment. The robot 100 includes external microphones 110, an internal sensor 120 and a signal separation device 200.
[0018] The external microphones 110 each are a sound collecting microphone provided on the body surface of the robot 100. Here, for the sake of description, it is assumed that a first external microphone 111 and a second external microphone 112 are provided. At this time, the external microphones 110 receive a voice Si from a user and a noise S2 from around the external microphones 110. In addition, the external microphones 110 also receive a noise S3 from a power source 30.
[0019] The internal sensor 120 exclusively detects the noise S3 from the power
source 30. The internal sensor 120 detects the noise from the power source 30, but the internal sensor 120 does not detect a sound signal (Si or S2) from the outside. It is desirable that the internal sensor 120 is, for example, arranged at a location in proximity to the external microphones 110, such as the back side of the external microphones 110. Such a sensor that exclusively detects the noise S3 from the power source 30 may be, for example, an acceleration sensor or a microphone having high directivity.
[0020] Note that the number of the external microphones 110 and the number of the internal sensors 120 are not limited, and are increased or reduced where necessary. For example, when a plurality of the external microphones 110 are provided, the internal sensor 120 may be provided in one-to-one correspondence with each external microphone 110.
[0021] Here, the user voice is denoted by S^f, t), the ambient noise is denoted by S2(f, t) and the power source noise is denoted by S3(f, t). In addition, a signal observed by the first external microphone 111 is denoted by X^f, t), a signal observed by the second external microphone 112 is denoted by X2(f, t) and a signal observed by the internal sensor 120 is denoted by Ri(f, t). At this time, the relationship between an original signal and an observed signal may be expressed by the following mathematical expression (3) using an unknown coefficient matrix A(f).
[0022] Here, the first external microphone 111 and the second external microphone 112 receive the user voice Si(f, t), the ambient noise S2(f, t) and the power source noise S3(f, t), so components (Au(f), A12(f), A13(f), A21(f), A22(f) and A23(f)) of the coefficient matrix A(f) corresponding to the observed signals Xi(f, t) and X2(f, t) are not 0. In contrast, the internal sensor 120 does not receive the user voice Si(f, t) or the ambient noise S2(f, t), so components of the coefficient matrix A(f) corresponding to the observed signal Ri(f, t) are 0 except the coefficient A33(f) corresponding to the power source noise 30.
[0023] FIG 2 is a block diagram of the signal separation device 200. The signal separation device 200 includes an analog/digital (A/D) conversion unit 210, a noise suppressing unit 300 and a voice recognition unit 220.
[0024] The A/D conversion unit 210 converts respective signals input from the external microphones 110 and the internal sensor 120 into digital signals and then outputs the digital signals to the noise suppressing unit 300.
[0025] The noise suppressing unit 300 executes process of suppressing noise contained in the input digital signals. The noise suppressing unit 300 includes a short-time discrete Fourier transform unit 310, an independent component analysis unit 320, a gain correction unit 330, a permutation solving unit 340 and an inverse discrete Fourier transform unit 350.
[0026] The short-time discrete Fourier transform unit 310 performs a short-time discrete Fourier transform on pieces of digital data input from the A/D conversion unit 210.
[0027] The independent component analysis unit 320 performs independent component analysis (ICA) on observed signals expressed in the time-frequency domain, obtained by the short-time discrete Fourier transform unit 310, and then calculates a separating matrix for each frequency bin. The detailed process of independent component analysis is, for example, described in JP -A- 2004-145172.
[0028] Here, the observed signals x^t), x2(t) and ri(t) each are subjected to a short-time discrete Fourier transform, and then the obtained signals (hereinafter, also referred to as "transformed signals") are denoted by Xi(f, t), X2(f, t) and R^f, t). Then, statistically independent separated signals (hereinafter, also referred to as "separated signals") Yi(f, t), Y2(f, t) and (f, t) are extracted on the basis of the following mathematical expression (4) using the separating matrix W(f). w2 f) (4)
[0029] In the present embodiment, the separated signal Qi(f, t) (a examp
internal noise separated signal) that is obtained by multiplying the translated signal Ri(f, t) containing only the power source noise S3(f, t) by the coefficient W33(f) is generated. ICA adaptively learns the separating matrix W(f) so that the separated signal (f, t) is independent of the separated signals Y^f, t) and Y2(f, t), so the separated signals Yi(f, t) and Y2(f, t) that do not contain the power source noise S3(f, t) are extracted (semi-blind signal separation). That is, the separated signals Y^f, t) and Y (f, t) each are components corresponding to other than the power source noise S3(f, t), that is, any one of the user voice Si(f, t) and the ambient noise S2(f, t).
[0030] The gain correction unit 330 executes gain correction process on a separating matrix W(f) at each frequency calculated by the independent component analysis unit 320.
[0031] The permutation solving unit 340 executes process for solving a permutation problem. FIG 3 is a block diagram of the permutation solving unit 340. Here, in the present embodiment, among the separated signals Yi(f, t), Y2(f, t) and Qi(f, t) that are separated by the independent component analysis unit 320, the separated signals Yi(f, t) and Y2(f, t) are already identified as correspondent components other than the power source noise S3(f, t), that is, any one of the user voice S^f, t) and the ambient noise S2(f, t). Thus, the object for permutation is the separated signals Yi(f, t) and Y2(f, t). The separated signals Yi(f, t) and Y2(f, t) are input to the permutation solving unit 340, and the separated signal C (f, t) is directly input to the subsequent inverse discrete Fourier transform unit 350.
[0032] Then, the permutation solving according to the present embodiment utilizes the fact that the probability density distribution of the user voice S^f, t) has a shape that is spiker than the probability density distribution of the ambient noise S2(f, t). Furthermore, in order to estimate the spikedness (degree of the peakedness) of the probability density distribution, the scale parameter c-i(f) of Laplacian distribution is used. Here, when the scale parameter ctj(f) of Laplacian distribution is estimated, the expected value of absolute value of the separated signal Y(f, t) is utilized. Hereinafter, the description will be made sequentially.
[0033] The permutation solving unit 340 includes a spikedness calculation unit 341 and a clustering determination unit 342.
[0034] The spikedness calculation unit 341 calculates the spikedness of the probability density distribution (degree of peakedness of distribution) of each of the separated signals Yi(f, t) and Y2(f, t). The scale parameter aj(f) of Laplacian distribution when the separated signal Yi(f, t) is subjected to fitting with Laplacian distribution is used as the spikedness. Then, the scale parameter aj(f) may be calculated through the following mathematical expression (5) using a maximum likelihood method.
[0035] Here, the separated signal Yi(f, t) is a complex spectrum, so |Yi(f, t)| means the absolute value of a complex number. In addition, et{|Yi(f, t)|} means the average of |Yi(f, t)| in a predetermined number of frames.
[0036] Here, FIG 4 is a schematic view that shows the flow of calculating spikednesses (scale parameters ;(f)) from the observed signals x^t), x2(t) and r^t). A voice signal collected by the first external microphone 111 is the observed signal xi(t), a voice signal collected by the second external microphone 112 is the observed signal x2(t) and a signal detected by the internal sensor 120 is the observed signal ri(t). These observed signals are subjected to a discrete Fourier transform for each frame of a predetermined duration, and the results are transformed signals Xi(f, t), X2(f, t) and Ri(f, t). The results of independent component analysis on the transformed signals Xi(f, t), X2(f, t) and R^f, t) are the separated signals Yx(f, t), Y2(f, t) and C (f, t). At this time, the spikedness (scale parameter ai(fk)) for the frequency bin f = fk is, for example, expressed by the following mathematical expression (6) using the duration to to t2.
[0037] The clustering determination unit 342 uses the thus calculated
spikednesses (scale parameters o¾(fk)) to label the separated signals Yi(fk, t) and Y2(fk, t), and, where necessary, interchanges the separated signals Yi(fk, t) and Y2(fk, t). That is, one of the separated signals Yi(fk, t) and Y (fk, t) is determined to correspond to the user voice Si(f, t), the other one is determined to correspond to the ambient noise S2(f, t), and then sorting destinations of the user voice Si(f, t) and the ambient noise S2(f, t) is standardized, at every frequency bin. Specifically, the one that has the largest spikedness (scale parameter cti(fk)) is determined to correspond to the user voice.
[0038] For example, when the user voice is sorted to index number 1 and the ambient noise is sorted to index number 2, the process will be as follows.
(Case 1) The case where ai(fk)≥ 2(fk) is assumed as case 1. In this case, it may be determined that the separated signal Yi(fk, t) corresponds to the user voice Si(fk, t) and the separated signal Y2(fk, 0 corresponds to the ambient noise S2(fk, t). In this case, interchanging is not required.
[0039] (Case 2) The case where c i(fk) < a2(fk) is assumed as case 2. In this case, it may be determined that the separated signal Y2(fk, t) corresponds to the user voice Si(fk, t) and the separated signal Y^fk, t) corresponds to the ambient noise S2(fk, t). In this case, at the frequency bin fk, the separated signals Yi(fk, t) and Y2(fk, t) are interchanged.
[0040] Such clustering is executed at every frequency bin.
[0041] Lastly, the inverse discrete Fourier transform unit 350 performs an inverse discrete Fourier transform, transforms data Yi(f, t), Y2(f, t) and Qi(f, t) in the time-frequency domain into data in the time domain, and then outputs the data.
[0042] With the above configuration, the following advantageous effects may be obtained.
(1) The internal sensor 120 that exclusively detects only the noise from the internal noise source (power source) 30 is provided. Then, independent component analysis optimizes the separated signal Qi(f, t) for estimating the internal noise and the other separated signals Ύι(ΐ, t) and Y2(f, t) so as to be independent of each other. The separated signals Qi(f, t) is generated from only the translated signal Ri(f, t) from the
internal sensor 120, so the internal noise is definitely output to the separated signal C (f, t). If the internal noise is contained in the separated signals Yi(f, t) and Y2(f, t), correlation occurs, so those components are removed through optimization in ICA. Thus, the internal noise is output to only the separated signals C (f, t). By so doing, any one of the separated signals Y^f, t) and Y2(f, t) other than the separated signal C (f, t) corresponds to the user voice. That is, it is only necessary to solve the permutation problem for the separated signals Yi(f, t) and Y2(f, t) other than the separated signal C (f, t). Thus, it is possible to reduce a calculation load for permutation solving.
[0043] (2) The noise from the internal noise source (power source) 30 is similar to the user voice in the high degree of peakedness of probability density distribution, or the like, and, therefore, it may be difficult to solve the permutation problem between the internal noise and the user voice. In the present embodiment, the sensor that detects only the internal noise is utilized, and the components W3i(f) and W32(f) of the separating matrix W(f) are modeled as 0. By so doing, the internal noise is concentrated to the separated signal Qi(f, t), and is not contained in the remaining separated signals Y^f, t) and Y2(f, t). Thus, it is possible to improve accuracy for separating and extracting the user voice.
[0044] (3) In the present embodiment, in labeling, the spikedness of probability density distribution (degree of peakedness of distribution) of each of the separated signals Yi(f, t) and Y2(f, t) is used, and, in addition, the scale parameter aj(f) of Laplacian distribution when the separated signal Y;(f, t) is subjected to fitting with Laplacian distribution is used as the spikedness. With the above method, it is possible to remarkably reduce a calculation load.
[0045] Note that the aspect of the invention is not limited to the embodiment described above; it may be appropriately modified without departing from the scope of the invention. For example, in the above embodiment, the robot 100 is equipped with the signal separation device 20; instead, the aspect of the invention may be applied to a voice recognition system of an automobile, a telephone, or the like.
Claims
1. A signal separation system that separates an observed signal in the time domain, which mixedly contains a plurality of signals, into the plurality of signals using independent component analysis, and that extracts a specific voice from the separated signals, the signal separation system comprising:
: an external microphone that is oriented outside of the signal separation system; an internal sensor that detects only an internal noise from an internal noise source present inside the signal separation system;
a discrete Fourier transform unit that performs a discrete Fourier transform on signals from the external microphone and the internal sensor;
an independent component analysis unit that performs independent component analysis on transformed signals that have been subjected to a discrete Fourier transform by the discrete Fourier transform unit so that an internal noise separated signal that contains only the internal noise is extracted on the basis of the transformed signal of the signal detected by the internal sensor and external separated signals and that are independent of the internal noise separated signal and that do not contain the internal noise are extracted; and
a permutation solving unit that executes permutation solving on the external separated signals to extract the specific voice.
2. The signal separation system according to claim 1, wherein the permutation solving unit includes
a spikedness calculation unit that calculates a spikedness, which is a degree of peakedness of probability density distribution of each of the external separated signals; and
a clustering unit that labels the external separated signals as the specific voice or an ambient noise on the basis of the spikedness.
3. The signal separation system according to claim 2, wherein the spikedness calculation unit calculates a scale parameter, as the spikedness, of Laplacian distribution when each of the external separated signals is subjected to fitting with Laplacian distribution.
4. The signal separation system according to claim 3, wherein
the spikedness calculation unit calculates an expected value of an absolute value of each of the external separated signals as a maximum likelihood value of the scale parameter.
5. The signal separation system according to claim 3 or 4, wherein
the spikedness calculation unit calculates the scale parameter aj(f) using the following mathematical expression when a separated signals is denoted by Y(f, t),
where et{|Yi(f, t)|} represents an average of |Yj(f, t)| in a predetermined number of frames.
6. The signal separation system according to any one of claims 2 to 5, wherein the clustering unit labels the external separated signal having the largest spikedness as the specific voice.
7. The signal separation system according to claim 1, wherein
the independent component analysis unit performs independent component analysis using a separating matrix in which a component corresponding to the internal noise, among internal components that are components corresponding to the transformed signal of the signal detected by the internal sensor, is not zero and the other internal components are zero.
8. The signal separation system according to claim 7, wherein the independent component analysis unit adaptively learns the separating matrix so that the internal noise separated signal is independent of the external separated signals.
9. The signal separation system according to claim 1, wherein
the internal sensor is arranged on a back side of the external microphone.
10. A signal separation method that separates an observed signal in the time domain, which mixedly contains a plurality of signals and is observed in a system that includes an external microphone that is oriented outside of the system and an internal sensor that detects only an internal noise from an internal noise source present inside the system, into the plurality of signals using independent component analysis, and that extracts a specific voice from the separated signals, the signal separation method comprising:
performing a discrete Fourier transform on signals from the external microphone and the internal sensor;
performing independent component analysis on transformed signals that have been subjected to a discrete Fourier transform so that an internal noise separated signal that contains only the internal noise is extracted on the basis of the transformed signal of the signal detected by the internal sensor and external separated signals that are independent of the internal noise separated signal and that do not contain the internal noise are extracted; and
executing permutation solving on the external separated signals to extract the specific voice.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2009234978A JP2011081293A (en) | 2009-10-09 | 2009-10-09 | Signal separation device and signal separation method |
JP2009-234978 | 2009-10-09 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2011042808A1 true WO2011042808A1 (en) | 2011-04-14 |
WO2011042808A8 WO2011042808A8 (en) | 2011-10-20 |
Family
ID=43302952
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/IB2010/002660 WO2011042808A1 (en) | 2009-10-09 | 2010-10-07 | Signal separation system and signal separation method |
Country Status (2)
Country | Link |
---|---|
JP (1) | JP2011081293A (en) |
WO (1) | WO2011042808A1 (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111599234A (en) * | 2020-05-19 | 2020-08-28 | 黑龙江工业学院 | Automatic English spoken language scoring system based on voice recognition |
CN111682881A (en) * | 2020-06-17 | 2020-09-18 | 北京润科通用技术有限公司 | Communication reconnaissance simulation method and system suitable for multi-user signals |
CN118347546A (en) * | 2024-04-25 | 2024-07-16 | 湖南大学 | Coupling sensing method for micro electric field and sensor |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2017138254A (en) | 2016-02-05 | 2017-08-10 | 国立研究開発法人海洋研究開発機構 | System and method for estimating resources |
JP6675693B2 (en) * | 2019-03-22 | 2020-04-01 | 国立研究開発法人海洋研究開発機構 | Resource estimation system and resource estimation method |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2004145172A (en) | 2002-10-28 | 2004-05-20 | Nippon Telegr & Teleph Corp <Ntt> | Method, apparatus and program for blind signal separation, and recording medium where the program is recorded |
WO2009113192A1 (en) | 2008-03-11 | 2009-09-17 | トヨタ自動車株式会社 | Signal separating apparatus and signal separating method |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2002023776A (en) * | 2000-07-13 | 2002-01-25 | Univ Kinki | Method for identifying speaker voice and non-voice noise in blind separation, and method for specifying speaker voice channel |
JP4496378B2 (en) * | 2003-09-05 | 2010-07-07 | 財団法人北九州産業学術推進機構 | Restoration method of target speech based on speech segment detection under stationary noise |
JP4496379B2 (en) * | 2003-09-17 | 2010-07-07 | 財団法人北九州産業学術推進機構 | Reconstruction method of target speech based on shape of amplitude frequency distribution of divided spectrum series |
JP2007235646A (en) * | 2006-03-02 | 2007-09-13 | Hitachi Ltd | Sound source separation device, method and program |
-
2009
- 2009-10-09 JP JP2009234978A patent/JP2011081293A/en active Pending
-
2010
- 2010-10-07 WO PCT/IB2010/002660 patent/WO2011042808A1/en active Application Filing
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2004145172A (en) | 2002-10-28 | 2004-05-20 | Nippon Telegr & Teleph Corp <Ntt> | Method, apparatus and program for blind signal separation, and recording medium where the program is recorded |
WO2009113192A1 (en) | 2008-03-11 | 2009-09-17 | トヨタ自動車株式会社 | Signal separating apparatus and signal separating method |
Non-Patent Citations (4)
Title |
---|
EVEN J ET AL: "Frequency domain semi-blind signal separation: application to the rejection of internal noises", IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 31 March 2008 (2008-03-31), pages 157 - 160, XP031250512, ISBN: 978-1-4244-1483-3 * |
EVEN J ET AL: "Semi-blind suppression of internal noise for hands-free robot spoken dialog system", INTELLIGENT ROBOTS AND SYSTEMS, 2009. IROS 2009. IEEE/RSJ INTERNATIONAL CONFERENCE ON, IEEE, PISCATAWAY, NJ, USA, 10 October 2009 (2009-10-10), pages 658 - 663, XP031580809, ISBN: 978-1-4244-3803-7 * |
JANI EVEN ET AL: "An improved permutation solver for blind signal separation based front-ends in robot audition", INTELLIGENT ROBOTS AND SYSTEMS, 2008. IROS 2008. IEEE/RSJ INTERNATIONAL CONFERENCE ON, IEEE, PISCATAWAY, NJ, USA, 22 September 2008 (2008-09-22), pages 2172 - 2177, XP031348134, ISBN: 978-1-4244-2057-5 * |
JANI EVEN; HIROSHI SARUWATARI; KIYOHIRO SHIKANO, IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTICS AND SYSTEMS, September 2008 (2008-09-01), pages 2172 - 2177 |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111599234A (en) * | 2020-05-19 | 2020-08-28 | 黑龙江工业学院 | Automatic English spoken language scoring system based on voice recognition |
CN111682881A (en) * | 2020-06-17 | 2020-09-18 | 北京润科通用技术有限公司 | Communication reconnaissance simulation method and system suitable for multi-user signals |
CN111682881B (en) * | 2020-06-17 | 2021-12-24 | 北京润科通用技术有限公司 | Communication reconnaissance simulation method and system suitable for multi-user signals |
CN118347546A (en) * | 2024-04-25 | 2024-07-16 | 湖南大学 | Coupling sensing method for micro electric field and sensor |
Also Published As
Publication number | Publication date |
---|---|
JP2011081293A (en) | 2011-04-21 |
WO2011042808A8 (en) | 2011-10-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102074230B (en) | Speech recognition device, speech recognition method, and program | |
WO2011042808A1 (en) | Signal separation system and signal separation method | |
JP4825552B2 (en) | Speech recognition device, frequency spectrum acquisition device, and speech recognition method | |
JP5375400B2 (en) | Audio processing apparatus, audio processing method and program | |
US11978471B2 (en) | Signal processing apparatus, learning apparatus, signal processing method, learning method and program | |
JP2007235646A (en) | Sound source separation device, method and program | |
WO2005029467A1 (en) | A method for recovering target speech based on amplitude distributions of separated signals | |
US8452592B2 (en) | Signal separating apparatus and signal separating method | |
CN111899756A (en) | Single-channel voice separation method and device | |
CN106847301A (en) | A kind of ears speech separating method based on compressed sensing and attitude information | |
Nakajima et al. | Adaptive step-size parameter control for real-world blind source separation | |
CN108962276B (en) | Voice separation method and device | |
WO2015048070A1 (en) | Time-frequency directional processing of audio signals | |
KR101802444B1 (en) | Robust speech recognition apparatus and method for Bayesian feature enhancement using independent vector analysis and reverberation parameter reestimation | |
US20170127180A1 (en) | Method for Equalization of Microphone Sensitivities | |
JP2002023776A (en) | Method for identifying speaker voice and non-voice noise in blind separation, and method for specifying speaker voice channel | |
Liao et al. | An effective low complexity binaural beamforming algorithm for hearing aids | |
WO2005029463A1 (en) | A method for recovering target speech based on speech segment detection under a stationary noise | |
CN105830152A (en) | Sound collecting device, input signal correction method for sound collecting device, and mobile apparatus information system | |
JP7014682B2 (en) | Sound source separation evaluation device and sound source separation device | |
CN113035225A (en) | Visual voiceprint assisted voice separation method and device | |
Koutras et al. | Improving simultaneous speech recognition in real room environments using overdetermined blind source separation. | |
Huang et al. | A speaker diarization system with robust speaker localization and voice activity detection | |
Xian et al. | Two stage audio-video speech separation using multimodal convolutional neural networks | |
JP2008278406A (en) | Sound source separation apparatus, sound source separation program and sound source separation method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 10777098 Country of ref document: EP Kind code of ref document: A1 |
|
DPE1 | Request for preliminary examination filed after expiration of 19th month from priority date (pct application filed from 20040101) | ||
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 10777098 Country of ref document: EP Kind code of ref document: A1 |