[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

WO2011042808A1 - Signal separation system and signal separation method - Google Patents

Signal separation system and signal separation method Download PDF

Info

Publication number
WO2011042808A1
WO2011042808A1 PCT/IB2010/002660 IB2010002660W WO2011042808A1 WO 2011042808 A1 WO2011042808 A1 WO 2011042808A1 IB 2010002660 W IB2010002660 W IB 2010002660W WO 2011042808 A1 WO2011042808 A1 WO 2011042808A1
Authority
WO
WIPO (PCT)
Prior art keywords
signal
signals
separated
internal
noise
Prior art date
Application number
PCT/IB2010/002660
Other languages
French (fr)
Other versions
WO2011042808A8 (en
Inventor
Tomoya Takatani
Jani Even
Original Assignee
Toyota Jidosha Kabushiki Kaisha
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Toyota Jidosha Kabushiki Kaisha filed Critical Toyota Jidosha Kabushiki Kaisha
Publication of WO2011042808A1 publication Critical patent/WO2011042808A1/en
Publication of WO2011042808A8 publication Critical patent/WO2011042808A8/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating

Definitions

  • the invention relates to a signal separation system and signal separation method that extract a specific signal in a state where a plurality of signals are mixed in a space and, more particularly, to a technique for solving a permutation problem.
  • ICA independent component analysis
  • signals S(f, t) are estimated through independent component analysis in the frequency domain by using signals X(f, t).
  • the signals X(f, t) are obtained by transforming the observed signals x(t) into signal in the time-frequency domain through a short-time discrete Fourier transform.
  • the signals S(f, t) and X(f, t) are respectively obtained by performing a short-time discrete Fourier transform on the original signals s(t) and the observed signals x(t).
  • the following mathematical expression (2) is considered to estimate the signals S(f, t) in the time-frequency domain.
  • Y(f, t) represents a column vector that has the kth output Y k (f, t) as elements.
  • W(f) represents n by n matrix (separating matrix) having Wjj(f) as elements.
  • the separating matrix W(f) by which the outputs Yi(f, t) to Y n (f, t) are statistically independent of one another (actually, independence becomes maximum) when time t is varied while the frequency bin f is fixed is calculated.
  • statistically independent outputs Yi(f, t) to Y n (f, t) are obtained for all of the frequency bin f on the basis of the thus calculated separating matrix W(f), those are subjected to an inverse Fourier transform to thereby make it possible to obtain separated signals y(t) in the time domain.
  • JP-A-2004-145172 describes a method of solving the permutation problem in such a manner that the incoming directions of signals are estimated and then the signals are labeled on the basis of the directional information of each signal.
  • sound sources are simple sound sources, so it is not always possible to correctly estimate the incoming directions of the signals. For example, in the case of a diffusive noise, the direction of the noise cannot be identified, and, therefore, wrong labeling occurs.
  • WO/2009/113192 and the following non-patent document describe a method in which the joint probability density distribution of each of separated signals is calculated and then the separated signals are divided into voice and noise on the basis of the shape of the joint probability density distribution.
  • a signal of which the joint probability density distribution is a non-Gaussian distribution is determined as a specific voice signal
  • a signal of which the joint probability density distribution is a Gaussian distribution is determined as a noise signal.
  • FIG. 5 is a view that shows a robot 10 having a voice recognition function.
  • the robot 10 includes a microphone array 12 formed of a plurality of microphones 11 and a signal separation device 20 that processes observed signals from the microphone array 12.
  • An ambient noise S 2 enters the microphone array 12 together with a user voice Si.
  • the robot itself becomes a noise generating source. That is, the robot 10 includes a power source 30, such as a motor, so a noise sound S 3 from the power source 30 also enters the microphones 11.
  • the observed signals x(t) contain the noise S3 from the power source 30.
  • the signal separation device 20 performs independent component analysis on signals that contain the user voice Si(f, t), the ambient noise S 2 (f, t) and the power source noise S 3 (f, t) to calculate statistically independent separated signals Y ⁇ (f, t) to Y n (f, t), and then labels the separated signals Y ⁇ f, t) to Y n (f, t).
  • the signal of which the joint probability density distribution is a non-Gaussian distribution is simply determined as a user voice as described above, there is a possibility that wrong labeling occurs. This is because the noise S 3 of the power source 30 also has a non-Gaussian joint probability density distribution having a high kurtosis.
  • a first aspect of the invention provides a signal separation system that separates an observed signal in the time domain, which mixedly contains a plurality of signals, into the plurality of signals using independent component analysis, and that extracts a specific voice from the separated signals.
  • the signal separation system includes: an external microphone that is oriented outside of the signal separation system; an internal sensor that detects only an internal noise from an internal noise source present inside the signal separation system; a discrete Fourier transform unit that performs a discrete Fourier transform on signals from the external microphone and the internal sensor; an independent component analysis unit that performs independent component analysis on transformed signals that have been subjected to a discrete Fourier transform by the discrete Fourier transform unit so that an internal noise separated signal that contains only the internal noise is extract on the basis of the transformed signal of the signal detected by the internal sensor and external separated signals that are independent of the internal noise separated signal and that do not contain the internal noise, are extracted; and a permutation solving unit that executes permutation solving on the external separated signals to extract the specific voice.
  • the permutation solving unit may include a spikedness calculation unit that calculates a spikedness, which is a degree of peakedness of probability density distribution of each of the external separated signals; and a clustering unit that labels the external separated signals as the specific voice or an ambient noise on the basis of the spikedness.
  • the spikedness calculation unit may calculate a scale parameter, as the spikedness, of Laplacian distribution when each of the external separated signals is subjected to fitting with Laplacian distribution.
  • the clustering unit may determine the external separated signal having the largest spikedness as the specific voice.
  • a second aspect of the invention provides a signal separation method.
  • the signal separation method separates an observed signal in the time domain, which mixedly contains a plurality of signals and is observed in a system that includes an external microphone that is oriented outside of the system and an internal sensor that detects only an internal noise from an internal noise source present inside the system, into the plurality of signals using independent component analysis, and that extracts a specific voice from the separated signals.
  • the signal separation method includes: performing a discrete Fourier transform on signals from the external microphone and the internal sensor; performing independent component analysis on transformed signals that have been subjected to a discrete Fourier transform so that an internal noise separated signal that contains only the internal noise is extracted on the basis of the transformed signal of the signal detected by the internal sensor and external separated signals that are independent of the internal noise separated signal and that do not contain the internal noise are extracted; and executing permutation solving on the external separated signals to extract the specific voice.
  • FIG 1 is a view that shows a robot equipped with a signal separation device 200 according to a embodiment of the invention
  • FIG 2 is a block diagram of the signal separation device 200
  • FIG 3 is a block diagram of a permutation solving unit 340
  • FIG 4 is a schematic view of the flow of calculating a spikedness (scale parameter c-i(f)) from each of observed signals xi(t) and x 2 (t);
  • FIG 5 is a view that shows a robot 10 having a voice recognition function.
  • FIG 1 is a view that shows a robot equipped with a signal separation device according to the first embodiment.
  • the robot 100 includes external microphones 110, an internal sensor 120 and a signal separation device 200.
  • the external microphones 110 each are a sound collecting microphone provided on the body surface of the robot 100.
  • a first external microphone 111 and a second external microphone 112 are provided.
  • the external microphones 110 receive a voice Si from a user and a noise S 2 from around the external microphones 110.
  • the external microphones 110 also receive a noise S 3 from a power source 30.
  • the internal sensor 120 exclusively detects the noise S 3 from the power source 30.
  • the internal sensor 120 detects the noise from the power source 30, but the internal sensor 120 does not detect a sound signal (Si or S 2 ) from the outside. It is desirable that the internal sensor 120 is, for example, arranged at a location in proximity to the external microphones 110, such as the back side of the external microphones 110.
  • Such a sensor that exclusively detects the noise S 3 from the power source 30 may be, for example, an acceleration sensor or a microphone having high directivity.
  • the number of the external microphones 110 and the number of the internal sensors 120 are not limited, and are increased or reduced where necessary.
  • the internal sensor 120 may be provided in one-to-one correspondence with each external microphone 110.
  • the user voice is denoted by S ⁇ f, t
  • the ambient noise is denoted by S 2 (f, t)
  • the power source noise is denoted by S 3 (f, t).
  • a signal observed by the first external microphone 111 is denoted by X ⁇ f, t
  • a signal observed by the second external microphone 112 is denoted by X 2 (f, t)
  • a signal observed by the internal sensor 120 is denoted by Ri(f, t).
  • the relationship between an original signal and an observed signal may be expressed by the following mathematical expression (3) using an unknown coefficient matrix A(f).
  • the first external microphone 111 and the second external microphone 112 receive the user voice Si(f, t), the ambient noise S 2 (f, t) and the power source noise S 3 (f, t), so components (A u (f), A 12 (f), A 13 (f), A 21 (f), A 22 (f) and A 23 (f)) of the coefficient matrix A(f) corresponding to the observed signals Xi(f, t) and X 2 (f, t) are not 0.
  • FIG 2 is a block diagram of the signal separation device 200.
  • the signal separation device 200 includes an analog/digital (A/D) conversion unit 210, a noise suppressing unit 300 and a voice recognition unit 220.
  • the A/D conversion unit 210 converts respective signals input from the external microphones 110 and the internal sensor 120 into digital signals and then outputs the digital signals to the noise suppressing unit 300.
  • the noise suppressing unit 300 executes process of suppressing noise contained in the input digital signals.
  • the noise suppressing unit 300 includes a short-time discrete Fourier transform unit 310, an independent component analysis unit 320, a gain correction unit 330, a permutation solving unit 340 and an inverse discrete Fourier transform unit 350.
  • the short-time discrete Fourier transform unit 310 performs a short-time discrete Fourier transform on pieces of digital data input from the A/D conversion unit 210.
  • the independent component analysis unit 320 performs independent component analysis (ICA) on observed signals expressed in the time-frequency domain, obtained by the short-time discrete Fourier transform unit 310, and then calculates a separating matrix for each frequency bin.
  • ICA independent component analysis
  • the detailed process of independent component analysis is, for example, described in JP -A- 2004-145172.
  • the observed signals x ⁇ t), x 2 (t) and ri(t) each are subjected to a short-time discrete Fourier transform, and then the obtained signals (hereinafter, also referred to as “transformed signals”) are denoted by Xi(f, t), X 2 (f, t) and R ⁇ f, t).
  • transformed signals Xi(f, t)
  • statistically independent separated signals hereinafter, also referred to as "separated signals” Yi(f, t), Y 2 (f, t) and (f, t) are extracted on the basis of the following mathematical expression (4) using the separating matrix W(f). w 2 f) (4)
  • the separated signal Qi(f, t) (a examp internal noise separated signal) that is obtained by multiplying the translated signal Ri(f, t) containing only the power source noise S 3 (f, t) by the coefficient W 33 (f) is generated.
  • ICA adaptively learns the separating matrix W(f) so that the separated signal (f, t) is independent of the separated signals Y ⁇ f, t) and Y 2 (f, t), so the separated signals Yi(f, t) and Y 2 (f, t) that do not contain the power source noise S 3 (f, t) are extracted (semi-blind signal separation).
  • the separated signals Y ⁇ f, t) and Y (f, t) each are components corresponding to other than the power source noise S 3 (f, t), that is, any one of the user voice Si(f, t) and the ambient noise S 2 (f, t).
  • the gain correction unit 330 executes gain correction process on a separating matrix W(f) at each frequency calculated by the independent component analysis unit 320.
  • the permutation solving unit 340 executes process for solving a permutation problem.
  • FIG 3 is a block diagram of the permutation solving unit 340.
  • the separated signals Yi(f, t), Y 2 (f, t) and Qi(f, t) that are separated by the independent component analysis unit 320 are already identified as correspondent components other than the power source noise S 3 (f, t), that is, any one of the user voice S ⁇ f, t) and the ambient noise S 2 (f, t).
  • the object for permutation is the separated signals Yi(f, t) and Y 2 (f, t).
  • the separated signals Yi(f, t) and Y 2 (f, t) are input to the permutation solving unit 340, and the separated signal C (f, t) is directly input to the subsequent inverse discrete Fourier transform unit 350.
  • the permutation solving according to the present embodiment utilizes the fact that the probability density distribution of the user voice S ⁇ f, t) has a shape that is spiker than the probability density distribution of the ambient noise S 2 (f, t). Furthermore, in order to estimate the spikedness (degree of the peakedness) of the probability density distribution, the scale parameter c-i(f) of Laplacian distribution is used. Here, when the scale parameter ctj(f) of Laplacian distribution is estimated, the expected value of absolute value of the separated signal Y(f, t) is utilized. Hereinafter, the description will be made sequentially. [0033]
  • the permutation solving unit 340 includes a spikedness calculation unit 341 and a clustering determination unit 342.
  • the spikedness calculation unit 341 calculates the spikedness of the probability density distribution (degree of peakedness of distribution) of each of the separated signals Yi(f, t) and Y 2 (f, t).
  • the scale parameter aj(f) of Laplacian distribution when the separated signal Yi(f, t) is subjected to fitting with Laplacian distribution is used as the spikedness.
  • the scale parameter aj(f) may be calculated through the following mathematical expression (5) using a maximum likelihood method.
  • the separated signal Yi(f, t) is a complex spectrum, so
  • means the average of
  • FIG 4 is a schematic view that shows the flow of calculating spikednesses (scale parameters ; (f)) from the observed signals x ⁇ t), x 2 (t) and r ⁇ t).
  • a voice signal collected by the first external microphone 111 is the observed signal xi(t)
  • a voice signal collected by the second external microphone 112 is the observed signal x 2 (t)
  • a signal detected by the internal sensor 120 is the observed signal ri(t).
  • These observed signals are subjected to a discrete Fourier transform for each frame of a predetermined duration, and the results are transformed signals Xi(f, t), X 2 (f, t) and Ri(f, t).
  • the clustering determination unit 342 uses the thus calculated spikednesses (scale parameters o3 ⁇ 4(f k )) to label the separated signals Yi(fk, t) and Y 2 (f k , t), and, where necessary, interchanges the separated signals Yi(f k , t) and Y 2 (f k , t).

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Circuit For Audible Band Transducer (AREA)
  • Soundproofing, Sound Blocking, And Sound Damping (AREA)

Abstract

A signal separation system includes: an external microphone; an internal sensor that detects only an internal noise from an internal noise source present inside the system; a discrete Fourier transform unit (310) that performs a discrete Fourier transform on signals from the external microphone and the internal sensor; an independent component analysis unit (320) that performs independent component analysis on transformed signals that have been subjected to a discrete Fourier transform so that an internal noise separated signal that contains only the internal noise is extracted on the basis of the transformed signal of the signal detected by the internal sensor and external separated signals that are independent of the internal, noise separated signal and that do not contain the internal noise are extracted; and a permutation solving unit (340) that executes permutation solving on the external separated signals to extract a specific voice.

Description

SIGNAL SEPARATION SYSTEM AND SIGNAL SEPARATION METHOD
BACKGROUND OF THE INVENTION 1. Field of the Invention
[0001] The invention relates to a signal separation system and signal separation method that extract a specific signal in a state where a plurality of signals are mixed in a space and, more particularly, to a technique for solving a permutation problem. 2. Description of the Related Art
[0002] There is known an independent component analysis (ICA) that separate and decode a plurality of original signals on the basis of statistical independence when the plurality of original signals are linearly mixed with unknown coefficients (see Japanese Patent Application Publication No. 2004-145172 (JP-A-2004-145172)).
[0003] Where observed signals that are obtained by observing a plurality of original signals (sound sources) s(t) with a plurality of microphones are x(t), the observed si nals x(t) are expressed by the mathematical expression (1).
Figure imgf000002_0001
[0004] In the ICA, signals S(f, t) are estimated through independent component analysis in the frequency domain by using signals X(f, t). The signals X(f, t) are obtained by transforming the observed signals x(t) into signal in the time-frequency domain through a short-time discrete Fourier transform. Here, the signals S(f, t) and X(f, t) are respectively obtained by performing a short-time discrete Fourier transform on the original signals s(t) and the observed signals x(t). The following mathematical expression (2) is considered to estimate the signals S(f, t) in the time-frequency domain. In the mathematical expression (2), Y(f, t) represents a column vector that has the kth output Yk(f, t) as elements. W(f) represents n by n matrix (separating matrix) having Wjj(f) as elements.
Figure imgf000003_0001
[0005] Subsequently, the separating matrix W(f) by which the outputs Yi(f, t) to Yn(f, t) are statistically independent of one another (actually, independence becomes maximum) when time t is varied while the frequency bin f is fixed is calculated. After statistically independent outputs Yi(f, t) to Yn(f, t) are obtained for all of the frequency bin f on the basis of the thus calculated separating matrix W(f), those are subjected to an inverse Fourier transform to thereby make it possible to obtain separated signals y(t) in the time domain.
[0006] However, in the independent component analysis in the time-frequency domain, signal separating process is performed for each frequency bin, and the relationship among the frequency bins is not considered. Therefore, even when separation of signals is successful, there is a possibility that inconsistency of a separated destination occurs among the frequency bins. The inconsistency of a separated destination indicates a phenomenon that, for example, the signal Yi originates in the signal Si, whereas the signal Yi originates in the signal S2 at frequency bin f = 2, and this is called a problem of permutation.
[0007] JP-A-2004-145172 describes a method of solving the permutation problem in such a manner that the incoming directions of signals are estimated and then the signals are labeled on the basis of the directional information of each signal. However, actually, not all sound sources are simple sound sources, so it is not always possible to correctly estimate the incoming directions of the signals. For example, in the case of a diffusive noise, the direction of the noise cannot be identified, and, therefore, wrong labeling occurs.
[0008] In addition, WO/2009/113192 and the following non-patent document describe a method in which the joint probability density distribution of each of separated signals is calculated and then the separated signals are divided into voice and noise on the basis of the shape of the joint probability density distribution. In this method, for example, a signal of which the joint probability density distribution is a non-Gaussian distribution is determined as a specific voice signal, and a signal of which the joint probability density distribution is a Gaussian distribution is determined as a noise signal. According to the above method, even a noise (diffusive noise) is also accurately labeled, so it is possible to determine the separated destination of a signal with high precision. The above non-patent document is "An Improved permutation solver for blind signal separation based front-ends in robot audition (Jani Even, Hiroshi Saruwatari, Kiyohiro Shikano)", IEEE/RSJ International Conference on Intelligent Robotics and Systems (IROS2008), Nice, France, pp. 2172-2177, September 2008.
[0009] Here, the following case is assumed as an environment in which a signal separation system is actually used. FIG. 5 is a view that shows a robot 10 having a voice recognition function. The robot 10 includes a microphone array 12 formed of a plurality of microphones 11 and a signal separation device 20 that processes observed signals from the microphone array 12. An ambient noise S2 enters the microphone array 12 together with a user voice Si. Furthermore, the robot itself becomes a noise generating source. That is, the robot 10 includes a power source 30, such as a motor, so a noise sound S3 from the power source 30 also enters the microphones 11.
[0010] Thus, the observed signals x(t) contain the noise S3 from the power source 30. The signal separation device 20 performs independent component analysis on signals that contain the user voice Si(f, t), the ambient noise S2(f, t) and the power source noise S3(f, t) to calculate statistically independent separated signals Y\(f, t) to Yn(f, t), and then labels the separated signals Y^f, t) to Yn(f, t). However, if the signal of which the joint probability density distribution is a non-Gaussian distribution is simply determined as a user voice as described above, there is a possibility that wrong labeling occurs. This is because the noise S3 of the power source 30 also has a non-Gaussian joint probability density distribution having a high kurtosis.
[0011] In this way, when the existing method described in WO/2009/113192 and the above non-patent document is applied to an actual environment, there is a possibility that labeling of a separated signal is wrong. Furthermore, a computation load for calculating a joint probability density distribution is excessively high. Therefore, if the shape of the joint probability density distribution of a power source noise is also calculated in addition to the shapes of the joint probability density distributions for a user voice and an ambient noise, the computation load is excessively high. SUMMARY OF INVENTION
[0012] A first aspect of the invention provides a signal separation system that separates an observed signal in the time domain, which mixedly contains a plurality of signals, into the plurality of signals using independent component analysis, and that extracts a specific voice from the separated signals. The signal separation system includes: an external microphone that is oriented outside of the signal separation system; an internal sensor that detects only an internal noise from an internal noise source present inside the signal separation system; a discrete Fourier transform unit that performs a discrete Fourier transform on signals from the external microphone and the internal sensor; an independent component analysis unit that performs independent component analysis on transformed signals that have been subjected to a discrete Fourier transform by the discrete Fourier transform unit so that an internal noise separated signal that contains only the internal noise is extract on the basis of the transformed signal of the signal detected by the internal sensor and external separated signals that are independent of the internal noise separated signal and that do not contain the internal noise, are extracted; and a permutation solving unit that executes permutation solving on the external separated signals to extract the specific voice.
[0013] In the first aspect of the invention, the permutation solving unit may include a spikedness calculation unit that calculates a spikedness, which is a degree of peakedness of probability density distribution of each of the external separated signals; and a clustering unit that labels the external separated signals as the specific voice or an ambient noise on the basis of the spikedness. In the above configuration, the spikedness calculation unit may calculate a scale parameter, as the spikedness, of Laplacian distribution when each of the external separated signals is subjected to fitting with Laplacian distribution.
[0014] In the above configuration, the clustering unit may determine the external separated signal having the largest spikedness as the specific voice.
[0015] A second aspect of the invention provides a signal separation method. The signal separation method separates an observed signal in the time domain, which mixedly contains a plurality of signals and is observed in a system that includes an external microphone that is oriented outside of the system and an internal sensor that detects only an internal noise from an internal noise source present inside the system, into the plurality of signals using independent component analysis, and that extracts a specific voice from the separated signals. The signal separation method includes: performing a discrete Fourier transform on signals from the external microphone and the internal sensor; performing independent component analysis on transformed signals that have been subjected to a discrete Fourier transform so that an internal noise separated signal that contains only the internal noise is extracted on the basis of the transformed signal of the signal detected by the internal sensor and external separated signals that are independent of the internal noise separated signal and that do not contain the internal noise are extracted; and executing permutation solving on the external separated signals to extract the specific voice. BRIEF DESCRIPTION OF DRAWINGS
[0016] The features, advantages, and technical and industrial significance of this invention will be described below with reference to the accompanying drawings, in which like numerals denote like elements, and wherein:
FIG 1 is a view that shows a robot equipped with a signal separation device 200 according to a embodiment of the invention;
FIG 2 is a block diagram of the signal separation device 200;
FIG 3 is a block diagram of a permutation solving unit 340;
FIG 4 is a schematic view of the flow of calculating a spikedness (scale parameter c-i(f)) from each of observed signals xi(t) and x2(t); and
FIG 5 is a view that shows a robot 10 having a voice recognition function.
DETAILED DESCRIPTION OF EMBODIMENTS
[0017] Embodiments of the invention are illustrated in the drawings, and will be described with reference to the reference numerals assigned to components in the drawings. FIG 1 is a view that shows a robot equipped with a signal separation device according to the first embodiment. The robot 100 includes external microphones 110, an internal sensor 120 and a signal separation device 200.
[0018] The external microphones 110 each are a sound collecting microphone provided on the body surface of the robot 100. Here, for the sake of description, it is assumed that a first external microphone 111 and a second external microphone 112 are provided. At this time, the external microphones 110 receive a voice Si from a user and a noise S2 from around the external microphones 110. In addition, the external microphones 110 also receive a noise S3 from a power source 30.
[0019] The internal sensor 120 exclusively detects the noise S3 from the power source 30. The internal sensor 120 detects the noise from the power source 30, but the internal sensor 120 does not detect a sound signal (Si or S2) from the outside. It is desirable that the internal sensor 120 is, for example, arranged at a location in proximity to the external microphones 110, such as the back side of the external microphones 110. Such a sensor that exclusively detects the noise S3 from the power source 30 may be, for example, an acceleration sensor or a microphone having high directivity.
[0020] Note that the number of the external microphones 110 and the number of the internal sensors 120 are not limited, and are increased or reduced where necessary. For example, when a plurality of the external microphones 110 are provided, the internal sensor 120 may be provided in one-to-one correspondence with each external microphone 110.
[0021] Here, the user voice is denoted by S^f, t), the ambient noise is denoted by S2(f, t) and the power source noise is denoted by S3(f, t). In addition, a signal observed by the first external microphone 111 is denoted by X^f, t), a signal observed by the second external microphone 112 is denoted by X2(f, t) and a signal observed by the internal sensor 120 is denoted by Ri(f, t). At this time, the relationship between an original signal and an observed signal may be expressed by the following mathematical expression (3) using an unknown coefficient matrix A(f).
Figure imgf000008_0001
[0022] Here, the first external microphone 111 and the second external microphone 112 receive the user voice Si(f, t), the ambient noise S2(f, t) and the power source noise S3(f, t), so components (Au(f), A12(f), A13(f), A21(f), A22(f) and A23(f)) of the coefficient matrix A(f) corresponding to the observed signals Xi(f, t) and X2(f, t) are not 0. In contrast, the internal sensor 120 does not receive the user voice Si(f, t) or the ambient noise S2(f, t), so components of the coefficient matrix A(f) corresponding to the observed signal Ri(f, t) are 0 except the coefficient A33(f) corresponding to the power source noise 30. [0023] FIG 2 is a block diagram of the signal separation device 200. The signal separation device 200 includes an analog/digital (A/D) conversion unit 210, a noise suppressing unit 300 and a voice recognition unit 220.
[0024] The A/D conversion unit 210 converts respective signals input from the external microphones 110 and the internal sensor 120 into digital signals and then outputs the digital signals to the noise suppressing unit 300.
[0025] The noise suppressing unit 300 executes process of suppressing noise contained in the input digital signals. The noise suppressing unit 300 includes a short-time discrete Fourier transform unit 310, an independent component analysis unit 320, a gain correction unit 330, a permutation solving unit 340 and an inverse discrete Fourier transform unit 350.
[0026] The short-time discrete Fourier transform unit 310 performs a short-time discrete Fourier transform on pieces of digital data input from the A/D conversion unit 210.
[0027] The independent component analysis unit 320 performs independent component analysis (ICA) on observed signals expressed in the time-frequency domain, obtained by the short-time discrete Fourier transform unit 310, and then calculates a separating matrix for each frequency bin. The detailed process of independent component analysis is, for example, described in JP -A- 2004-145172.
[0028] Here, the observed signals x^t), x2(t) and ri(t) each are subjected to a short-time discrete Fourier transform, and then the obtained signals (hereinafter, also referred to as "transformed signals") are denoted by Xi(f, t), X2(f, t) and R^f, t). Then, statistically independent separated signals (hereinafter, also referred to as "separated signals") Yi(f, t), Y2(f, t) and (f, t) are extracted on the basis of the following mathematical expression (4) using the separating matrix W(f). w2 f) (4)
Figure imgf000009_0001
[0029] In the present embodiment, the separated signal Qi(f, t) (a examp internal noise separated signal) that is obtained by multiplying the translated signal Ri(f, t) containing only the power source noise S3(f, t) by the coefficient W33(f) is generated. ICA adaptively learns the separating matrix W(f) so that the separated signal (f, t) is independent of the separated signals Y^f, t) and Y2(f, t), so the separated signals Yi(f, t) and Y2(f, t) that do not contain the power source noise S3(f, t) are extracted (semi-blind signal separation). That is, the separated signals Y^f, t) and Y (f, t) each are components corresponding to other than the power source noise S3(f, t), that is, any one of the user voice Si(f, t) and the ambient noise S2(f, t).
[0030] The gain correction unit 330 executes gain correction process on a separating matrix W(f) at each frequency calculated by the independent component analysis unit 320.
[0031] The permutation solving unit 340 executes process for solving a permutation problem. FIG 3 is a block diagram of the permutation solving unit 340. Here, in the present embodiment, among the separated signals Yi(f, t), Y2(f, t) and Qi(f, t) that are separated by the independent component analysis unit 320, the separated signals Yi(f, t) and Y2(f, t) are already identified as correspondent components other than the power source noise S3(f, t), that is, any one of the user voice S^f, t) and the ambient noise S2(f, t). Thus, the object for permutation is the separated signals Yi(f, t) and Y2(f, t). The separated signals Yi(f, t) and Y2(f, t) are input to the permutation solving unit 340, and the separated signal C (f, t) is directly input to the subsequent inverse discrete Fourier transform unit 350.
[0032] Then, the permutation solving according to the present embodiment utilizes the fact that the probability density distribution of the user voice S^f, t) has a shape that is spiker than the probability density distribution of the ambient noise S2(f, t). Furthermore, in order to estimate the spikedness (degree of the peakedness) of the probability density distribution, the scale parameter c-i(f) of Laplacian distribution is used. Here, when the scale parameter ctj(f) of Laplacian distribution is estimated, the expected value of absolute value of the separated signal Y(f, t) is utilized. Hereinafter, the description will be made sequentially. [0033] The permutation solving unit 340 includes a spikedness calculation unit 341 and a clustering determination unit 342.
[0034] The spikedness calculation unit 341 calculates the spikedness of the probability density distribution (degree of peakedness of distribution) of each of the separated signals Yi(f, t) and Y2(f, t). The scale parameter aj(f) of Laplacian distribution when the separated signal Yi(f, t) is subjected to fitting with Laplacian distribution is used as the spikedness. Then, the scale parameter aj(f) may be calculated through the following mathematical expression (5) using a maximum likelihood method.
Figure imgf000011_0001
[0035] Here, the separated signal Yi(f, t) is a complex spectrum, so |Yi(f, t)| means the absolute value of a complex number. In addition, et{|Yi(f, t)|} means the average of |Yi(f, t)| in a predetermined number of frames.
[0036] Here, FIG 4 is a schematic view that shows the flow of calculating spikednesses (scale parameters ;(f)) from the observed signals x^t), x2(t) and r^t). A voice signal collected by the first external microphone 111 is the observed signal xi(t), a voice signal collected by the second external microphone 112 is the observed signal x2(t) and a signal detected by the internal sensor 120 is the observed signal ri(t). These observed signals are subjected to a discrete Fourier transform for each frame of a predetermined duration, and the results are transformed signals Xi(f, t), X2(f, t) and Ri(f, t). The results of independent component analysis on the transformed signals Xi(f, t), X2(f, t) and R^f, t) are the separated signals Yx(f, t), Y2(f, t) and C (f, t). At this time, the spikedness (scale parameter ai(fk)) for the frequency bin f = fk is, for example, expressed by the following mathematical expression (6) using the duration to to t2.
Figure imgf000011_0002
[0037] The clustering determination unit 342 uses the thus calculated spikednesses (scale parameters o¾(fk)) to label the separated signals Yi(fk, t) and Y2(fk, t), and, where necessary, interchanges the separated signals Yi(fk, t) and Y2(fk, t). That is, one of the separated signals Yi(fk, t) and Y (fk, t) is determined to correspond to the user voice Si(f, t), the other one is determined to correspond to the ambient noise S2(f, t), and then sorting destinations of the user voice Si(f, t) and the ambient noise S2(f, t) is standardized, at every frequency bin. Specifically, the one that has the largest spikedness (scale parameter cti(fk)) is determined to correspond to the user voice.
[0038] For example, when the user voice is sorted to index number 1 and the ambient noise is sorted to index number 2, the process will be as follows.
(Case 1) The case where ai(fk)≥ 2(fk) is assumed as case 1. In this case, it may be determined that the separated signal Yi(fk, t) corresponds to the user voice Si(fk, t) and the separated signal Y2(fk, 0 corresponds to the ambient noise S2(fk, t). In this case, interchanging is not required.
[0039] (Case 2) The case where c i(fk) < a2(fk) is assumed as case 2. In this case, it may be determined that the separated signal Y2(fk, t) corresponds to the user voice Si(fk, t) and the separated signal Y^fk, t) corresponds to the ambient noise S2(fk, t). In this case, at the frequency bin fk, the separated signals Yi(fk, t) and Y2(fk, t) are interchanged.
[0040] Such clustering is executed at every frequency bin.
[0041] Lastly, the inverse discrete Fourier transform unit 350 performs an inverse discrete Fourier transform, transforms data Yi(f, t), Y2(f, t) and Qi(f, t) in the time-frequency domain into data in the time domain, and then outputs the data.
[0042] With the above configuration, the following advantageous effects may be obtained.
(1) The internal sensor 120 that exclusively detects only the noise from the internal noise source (power source) 30 is provided. Then, independent component analysis optimizes the separated signal Qi(f, t) for estimating the internal noise and the other separated signals Ύι(ΐ, t) and Y2(f, t) so as to be independent of each other. The separated signals Qi(f, t) is generated from only the translated signal Ri(f, t) from the internal sensor 120, so the internal noise is definitely output to the separated signal C (f, t). If the internal noise is contained in the separated signals Yi(f, t) and Y2(f, t), correlation occurs, so those components are removed through optimization in ICA. Thus, the internal noise is output to only the separated signals C (f, t). By so doing, any one of the separated signals Y^f, t) and Y2(f, t) other than the separated signal C (f, t) corresponds to the user voice. That is, it is only necessary to solve the permutation problem for the separated signals Yi(f, t) and Y2(f, t) other than the separated signal C (f, t). Thus, it is possible to reduce a calculation load for permutation solving.
[0043] (2) The noise from the internal noise source (power source) 30 is similar to the user voice in the high degree of peakedness of probability density distribution, or the like, and, therefore, it may be difficult to solve the permutation problem between the internal noise and the user voice. In the present embodiment, the sensor that detects only the internal noise is utilized, and the components W3i(f) and W32(f) of the separating matrix W(f) are modeled as 0. By so doing, the internal noise is concentrated to the separated signal Qi(f, t), and is not contained in the remaining separated signals Y^f, t) and Y2(f, t). Thus, it is possible to improve accuracy for separating and extracting the user voice.
[0044] (3) In the present embodiment, in labeling, the spikedness of probability density distribution (degree of peakedness of distribution) of each of the separated signals Yi(f, t) and Y2(f, t) is used, and, in addition, the scale parameter aj(f) of Laplacian distribution when the separated signal Y;(f, t) is subjected to fitting with Laplacian distribution is used as the spikedness. With the above method, it is possible to remarkably reduce a calculation load.
[0045] Note that the aspect of the invention is not limited to the embodiment described above; it may be appropriately modified without departing from the scope of the invention. For example, in the above embodiment, the robot 100 is equipped with the signal separation device 20; instead, the aspect of the invention may be applied to a voice recognition system of an automobile, a telephone, or the like.

Claims

CLAIMS:
1. A signal separation system that separates an observed signal in the time domain, which mixedly contains a plurality of signals, into the plurality of signals using independent component analysis, and that extracts a specific voice from the separated signals, the signal separation system comprising:
: an external microphone that is oriented outside of the signal separation system; an internal sensor that detects only an internal noise from an internal noise source present inside the signal separation system;
a discrete Fourier transform unit that performs a discrete Fourier transform on signals from the external microphone and the internal sensor;
an independent component analysis unit that performs independent component analysis on transformed signals that have been subjected to a discrete Fourier transform by the discrete Fourier transform unit so that an internal noise separated signal that contains only the internal noise is extracted on the basis of the transformed signal of the signal detected by the internal sensor and external separated signals and that are independent of the internal noise separated signal and that do not contain the internal noise are extracted; and
a permutation solving unit that executes permutation solving on the external separated signals to extract the specific voice.
2. The signal separation system according to claim 1, wherein the permutation solving unit includes
a spikedness calculation unit that calculates a spikedness, which is a degree of peakedness of probability density distribution of each of the external separated signals; and
a clustering unit that labels the external separated signals as the specific voice or an ambient noise on the basis of the spikedness.
3. The signal separation system according to claim 2, wherein the spikedness calculation unit calculates a scale parameter, as the spikedness, of Laplacian distribution when each of the external separated signals is subjected to fitting with Laplacian distribution.
4. The signal separation system according to claim 3, wherein
the spikedness calculation unit calculates an expected value of an absolute value of each of the external separated signals as a maximum likelihood value of the scale parameter.
5. The signal separation system according to claim 3 or 4, wherein
the spikedness calculation unit calculates the scale parameter aj(f) using the following mathematical expression when a separated signals is denoted by Y(f, t),
Figure imgf000015_0001
where et{|Yi(f, t)|} represents an average of |Yj(f, t)| in a predetermined number of frames.
6. The signal separation system according to any one of claims 2 to 5, wherein the clustering unit labels the external separated signal having the largest spikedness as the specific voice.
7. The signal separation system according to claim 1, wherein
the independent component analysis unit performs independent component analysis using a separating matrix in which a component corresponding to the internal noise, among internal components that are components corresponding to the transformed signal of the signal detected by the internal sensor, is not zero and the other internal components are zero.
8. The signal separation system according to claim 7, wherein the independent component analysis unit adaptively learns the separating matrix so that the internal noise separated signal is independent of the external separated signals.
9. The signal separation system according to claim 1, wherein
the internal sensor is arranged on a back side of the external microphone.
10. A signal separation method that separates an observed signal in the time domain, which mixedly contains a plurality of signals and is observed in a system that includes an external microphone that is oriented outside of the system and an internal sensor that detects only an internal noise from an internal noise source present inside the system, into the plurality of signals using independent component analysis, and that extracts a specific voice from the separated signals, the signal separation method comprising:
performing a discrete Fourier transform on signals from the external microphone and the internal sensor;
performing independent component analysis on transformed signals that have been subjected to a discrete Fourier transform so that an internal noise separated signal that contains only the internal noise is extracted on the basis of the transformed signal of the signal detected by the internal sensor and external separated signals that are independent of the internal noise separated signal and that do not contain the internal noise are extracted; and
executing permutation solving on the external separated signals to extract the specific voice.
PCT/IB2010/002660 2009-10-09 2010-10-07 Signal separation system and signal separation method WO2011042808A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2009234978A JP2011081293A (en) 2009-10-09 2009-10-09 Signal separation device and signal separation method
JP2009-234978 2009-10-09

Publications (2)

Publication Number Publication Date
WO2011042808A1 true WO2011042808A1 (en) 2011-04-14
WO2011042808A8 WO2011042808A8 (en) 2011-10-20

Family

ID=43302952

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2010/002660 WO2011042808A1 (en) 2009-10-09 2010-10-07 Signal separation system and signal separation method

Country Status (2)

Country Link
JP (1) JP2011081293A (en)
WO (1) WO2011042808A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111599234A (en) * 2020-05-19 2020-08-28 黑龙江工业学院 Automatic English spoken language scoring system based on voice recognition
CN111682881A (en) * 2020-06-17 2020-09-18 北京润科通用技术有限公司 Communication reconnaissance simulation method and system suitable for multi-user signals
CN118347546A (en) * 2024-04-25 2024-07-16 湖南大学 Coupling sensing method for micro electric field and sensor

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2017138254A (en) 2016-02-05 2017-08-10 国立研究開発法人海洋研究開発機構 System and method for estimating resources
JP6675693B2 (en) * 2019-03-22 2020-04-01 国立研究開発法人海洋研究開発機構 Resource estimation system and resource estimation method

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004145172A (en) 2002-10-28 2004-05-20 Nippon Telegr & Teleph Corp <Ntt> Method, apparatus and program for blind signal separation, and recording medium where the program is recorded
WO2009113192A1 (en) 2008-03-11 2009-09-17 トヨタ自動車株式会社 Signal separating apparatus and signal separating method

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002023776A (en) * 2000-07-13 2002-01-25 Univ Kinki Method for identifying speaker voice and non-voice noise in blind separation, and method for specifying speaker voice channel
JP4496378B2 (en) * 2003-09-05 2010-07-07 財団法人北九州産業学術推進機構 Restoration method of target speech based on speech segment detection under stationary noise
JP4496379B2 (en) * 2003-09-17 2010-07-07 財団法人北九州産業学術推進機構 Reconstruction method of target speech based on shape of amplitude frequency distribution of divided spectrum series
JP2007235646A (en) * 2006-03-02 2007-09-13 Hitachi Ltd Sound source separation device, method and program

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004145172A (en) 2002-10-28 2004-05-20 Nippon Telegr & Teleph Corp <Ntt> Method, apparatus and program for blind signal separation, and recording medium where the program is recorded
WO2009113192A1 (en) 2008-03-11 2009-09-17 トヨタ自動車株式会社 Signal separating apparatus and signal separating method

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
EVEN J ET AL: "Frequency domain semi-blind signal separation: application to the rejection of internal noises", IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 31 March 2008 (2008-03-31), pages 157 - 160, XP031250512, ISBN: 978-1-4244-1483-3 *
EVEN J ET AL: "Semi-blind suppression of internal noise for hands-free robot spoken dialog system", INTELLIGENT ROBOTS AND SYSTEMS, 2009. IROS 2009. IEEE/RSJ INTERNATIONAL CONFERENCE ON, IEEE, PISCATAWAY, NJ, USA, 10 October 2009 (2009-10-10), pages 658 - 663, XP031580809, ISBN: 978-1-4244-3803-7 *
JANI EVEN ET AL: "An improved permutation solver for blind signal separation based front-ends in robot audition", INTELLIGENT ROBOTS AND SYSTEMS, 2008. IROS 2008. IEEE/RSJ INTERNATIONAL CONFERENCE ON, IEEE, PISCATAWAY, NJ, USA, 22 September 2008 (2008-09-22), pages 2172 - 2177, XP031348134, ISBN: 978-1-4244-2057-5 *
JANI EVEN; HIROSHI SARUWATARI; KIYOHIRO SHIKANO, IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTICS AND SYSTEMS, September 2008 (2008-09-01), pages 2172 - 2177

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111599234A (en) * 2020-05-19 2020-08-28 黑龙江工业学院 Automatic English spoken language scoring system based on voice recognition
CN111682881A (en) * 2020-06-17 2020-09-18 北京润科通用技术有限公司 Communication reconnaissance simulation method and system suitable for multi-user signals
CN111682881B (en) * 2020-06-17 2021-12-24 北京润科通用技术有限公司 Communication reconnaissance simulation method and system suitable for multi-user signals
CN118347546A (en) * 2024-04-25 2024-07-16 湖南大学 Coupling sensing method for micro electric field and sensor

Also Published As

Publication number Publication date
JP2011081293A (en) 2011-04-21
WO2011042808A8 (en) 2011-10-20

Similar Documents

Publication Publication Date Title
CN102074230B (en) Speech recognition device, speech recognition method, and program
WO2011042808A1 (en) Signal separation system and signal separation method
JP4825552B2 (en) Speech recognition device, frequency spectrum acquisition device, and speech recognition method
JP5375400B2 (en) Audio processing apparatus, audio processing method and program
US11978471B2 (en) Signal processing apparatus, learning apparatus, signal processing method, learning method and program
JP2007235646A (en) Sound source separation device, method and program
WO2005029467A1 (en) A method for recovering target speech based on amplitude distributions of separated signals
US8452592B2 (en) Signal separating apparatus and signal separating method
CN111899756A (en) Single-channel voice separation method and device
CN106847301A (en) A kind of ears speech separating method based on compressed sensing and attitude information
Nakajima et al. Adaptive step-size parameter control for real-world blind source separation
CN108962276B (en) Voice separation method and device
WO2015048070A1 (en) Time-frequency directional processing of audio signals
KR101802444B1 (en) Robust speech recognition apparatus and method for Bayesian feature enhancement using independent vector analysis and reverberation parameter reestimation
US20170127180A1 (en) Method for Equalization of Microphone Sensitivities
JP2002023776A (en) Method for identifying speaker voice and non-voice noise in blind separation, and method for specifying speaker voice channel
Liao et al. An effective low complexity binaural beamforming algorithm for hearing aids
WO2005029463A1 (en) A method for recovering target speech based on speech segment detection under a stationary noise
CN105830152A (en) Sound collecting device, input signal correction method for sound collecting device, and mobile apparatus information system
JP7014682B2 (en) Sound source separation evaluation device and sound source separation device
CN113035225A (en) Visual voiceprint assisted voice separation method and device
Koutras et al. Improving simultaneous speech recognition in real room environments using overdetermined blind source separation.
Huang et al. A speaker diarization system with robust speaker localization and voice activity detection
Xian et al. Two stage audio-video speech separation using multimodal convolutional neural networks
JP2008278406A (en) Sound source separation apparatus, sound source separation program and sound source separation method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 10777098

Country of ref document: EP

Kind code of ref document: A1

DPE1 Request for preliminary examination filed after expiration of 19th month from priority date (pct application filed from 20040101)
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 10777098

Country of ref document: EP

Kind code of ref document: A1