JP3936819B2

JP3936819B2 - Microphone system

Info

Publication number: JP3936819B2
Application number: JP12151899A
Authority: JP
Inventors: 望斉藤; 孝一中田; 真吾木内
Original assignee: Alpine Electronics Inc
Current assignee: Alpine Electronics Inc
Priority date: 1999-04-28
Filing date: 1999-04-28
Publication date: 2007-06-27
Anticipated expiration: 2019-04-28
Also published as: JP2000312396A

Description

【０００１】
【発明の属する技術分野】
本発明はマイクロホンシステムに係わり、特に、２つマイクロホンの出力信号を用いて適応信号処理を行いSN比を改善した話者音声信号を出力するマイクロホンシステムに関する。
【０００２】
【従来の技術】
現在の音声認識システムは、15dB以上のSN比（S：音声／N：ノイズ）が確保されている場合、約95％の認識率を実現できるくらいの技術レベルにまで達している。しかし、周囲に存在するノイズによりSN比が低下すると、それに伴って認識率が急激に低下する性質も有している。図８はSN比と認識性能との関係をいくつかの種類のマイクロホン（無指向性、単一指向性、狭指向性、AMNOR(Adaptive Microphone-array for Noise Reduction))について評価したもので、SN比と認識率がおおむねＳ字特性100を示す帯の中に包含されている。この図８から明らかなように、認識率はSN比の低下により急激に低下し、SN比が0dBの環境下において約50％にまで低下してしまう。
【０００３】
そのため、自動車が発生するノイズ（エンジン音・ロードノイズ・パターンノイズ・風切り音など）が存在する自動車車室内において、上記のような認識性能の劣化は避けられず、音声認識システムを車載化する上で大きな問題の一つとなっている。
前記したような事情から、周囲に存在するノイズの影響を少なくし、高いSN比で音声を受音するための方式が種々提案されており、複数のマイクロホンとディジタル信号処理を用いた高SN比受音システムはその一例である。かかる高SN比受音システムの中で最も簡単な構成のものは図９に示すように２つのマイクロホンを使用するシステムであるが、他にも、Griffith-Jim型アレイやAMNORといった、より高度なシステムが提案されている。
【０００４】
図９において、１，２は第１、第２のマイクロホン、３は適応信号処理部であり、誤差信号ｅが入力されると共にマイクロホン２の出力信号ｘ₂が参照信号として入力され、誤差信号ｅのパワーが最小となるようにLMS(Least Mean Square)アルゴリズムに基づいて適応信号処理を行う。適応信号処理部３において、３ａはLMS演算部、３ｂは例えばFIR型デジタルフィルタ構成の適応フィルタである。LMS演算部３ａは適応信号処理により誤差信号ｅのパワーが最小となるように適応フィルタ３ｂの係数を決定する。
【０００５】
４はマイクロホン１から出力されるノイズ信号を目標信号として入力される目標応答設定部であり、音響系の逆特性を精度よく近似するためのものである。適応フィルタ３ｂのタップ長の半分の信号遅延時間をｄとするとき、目標応答設定部４は該時間ｄの遅延特性を有し、オーディオ周波数帯域でフラットな特性（ゲイン１の特性）を有する。すなわち、目標応答設定部４は、図１０（ａ）に示すようにゲイン１のフラットな周波数特性を備え、図１０（ｂ）に示すように遅延時間ｄを有するインパルス応答特性を有している。この目標応答設定部４は、FIR型デジタルフィルタの遅延時間ｄに対応する係数を１にし、他の係数を０にすることにより実現できる。
５は減算部であり、目標応答設定部４から出力する目標応答より適応フィルタ３ｂの出力信号を減算して誤差信号ｅを出力する。
【０００６】
非音声認識時、マイクロホン１、２にはノイズのみが入力し、適応信号処理部３は適応信号処理により誤差信号ｅ、すなわち、ノイズ出力のパワーが最小となるようにフィルタ係数Wを決定する。一方、音声認識時には、適応信号処理部３はフィルタ係数の更新をせず、前記非音声認識時に決定したフィルタ係数Wを適応フィルタ３ｂに設定して音声信号を出力する。
図９に示すシステムに本来求められている理想的な性能は、音声認識時に出力信号として音声信号Xs(z)のみ(ノイズ出力は0)を出力することである。すなわち、ノイズ出力En(z)に関して、
En(z)＝Xn₁(z)z^-d−Xn₂(z)W(z) (1)
{En(z)}²の平均が最小値をとるように調整可能なパラメータ(適応フィルタ３ｂの係数)Wを決定することである。ただし、Xn₁(z)，Xn₂(z)はマイクロホン１、２の出力信号に含まれるノイズ信号である。
【０００７】
【発明が解決しようとする課題】
自動車には騒音源が多数存在するため、マイクロホン１，２が拾う自動車車室内ノイズのコヒーレンスは、マイクロホン１，２を遠ざけるにしたがって低下する傾向を有している。このため、２つのマイクロホン１，２を遠ざける程、(1)式の条件が満たされにくくなってしまう問題が生じ、マイクロホン１，２はできるだけ近い位置に配置する必要がある。
ところが、２つのマイクロホン１，２をできるだけ近い位置に配置すると、２つのマイクロホンにほぼ同様の音声とノイズがそれぞれ入射する可能性が高くなり、(1)式を満たすように適応フィルタ係数Wを決定してノイズを消去すると、音声までもが消去されてしまう。一方、音声が歪まないように適応フィルタ係数Wを決定すると、ノイズがほとんど消えず、SN比もほとんど改善されなくなってしまうという問題が発生する。
【０００８】
かかる問題は、図９に示したシステム特有のものではなく、Griffith-Jim型アレイやAMNORといった、より高度な高SN比受音システムを採用した場合でも、ほぼ同様に発生する。
以上から本発明の目的は、マイクロホンを２つ使用する図９のマイクロホンシステムにおいて、音声信号のSN比を改善できるようにすることである。
【０００９】
【課題を解決するための手段】
上記課題は本発明によれば、第１、第２の２つのマイクロホンと、第１のマイクロホンから出力される信号を入力されて目標信号を出力する目標信号出力部と、第２のマイクロホンから出力される信号を参照信号として入力されて適応信号処理を行う適応信号処理部と、適応信号処理部を構成する適応フィルタの出力信号と前記目標信号との差を出力する誤差信号発生部とを備えた車載のマイクロホンシステムにおいて、前記第１、第２のマイクロホンを接近して配置すると共に、第１のマイクロホンを話者の顔の真上車室天井に配置し、前記第２のマイクロホンを該第１のマイクロホンの位置より１〜５cm程度後頭部側の車室天井に離して配置することにより、第１のマイクロホンは第２のマイクロホンに比べて話者音声が大きく、かつ該話者音声に対する自動車内の話者音声以外の雑音が少なくなるようにし、第２のマイクロホンは第１のマイクロホンに比べて話者音声が小さく、かつ該話者音声に対する自動車内の話者音声以外の雑音が大きくなるようにし、前記適応信号処理部は、前記第１、第２のマイクロホンにノイズのみが入力する非音声認識時、適応信号処理により前記誤差信号発生部から出力する誤差信号のパワーが最小となるように前記適応フィルタの係数を決定し、前記第１、第２のマイクロホンにノイズと音声信号が入力する音声認識時、該適応フィルタの係数を更新せず、前記誤差信号を音声信号として出力することにより達成される。
このようにすれば、２つのマイクロホンの出力信号に含まれるノイズXn₁(z)，Xn₂(z)を略等しくでき、一方、２つのマイクロホンの出力信号に含まれる音声信号Xs₁(z)，Xs₂(z)を異ならせることができる。従って、ノイズ信号入力時のEn(z)の２乗平均値が最小となるように適応フィルタ係数Wを決定しても、(2)式の音声出力Es(z)は零とならず、音声信号のSN比を改善することができる。
【００１０】
マイクロホンの具体的な配置例としては、一方のマイクロホンを話者の顔の真上に配置し、他方のマイクロホンを該一方のマイクロホンの位置より１〜５cm程度後頭部側に離して配置する。このようにすれば、人間の音声放射特性により、マイクロホンが比較的近距離に配置されているにも拘らず、1つのマイクロホンはできるだけ高いSN比で音声を拾い、もう一方のマイクロホンでは、できるだけ低いSN比で音声を拾うようにできる。
【００１１】
【発明の実施の形態】
（ａ）マイクロホンシステムの構成
図１は本発明のマイクロホンシステムの構成図であり、図９のシステムと同一部分には同一符号を付している。図中、１０は話者であり、例えば自動車の運転手、１１，１２は第１、第２のマイクロホンである。第１のマイクロホン１１は話者の顔の真上天井に配置し、第２のマイクロホン１２は第１のマイクロホン位置より１〜５cm程度後頭部側の天井に配置する。
３は適応信号処理部で、誤差信号ｅが入力されると共にマイクロホン２の出力信号ｘ₂が参照信号として入力され、誤差信号ｅのパワーが最小となるようにLMSアルゴリズムに基づいて適応信号処理を行う。適応信号処理部３において、３ａはLMS演算部、３ｂはFIR型デジタルフィルタ構成の適応フィルタである。LMS演算部３ａは適応信号処理により誤差信号ｅのパワーが最小となるように適応フィルタ３ｂの係数を決定する。適応信号処理部３は、非音声認識時においてのみ適応信号処理により適応フィルタ３ｂのフィルタ係数Wを決定し、音声認識時にはフィルタ係数の更新をせず、前記非音声認識時に決定したフィルタ係数Wを適応フィルタ３ｂに設定する。
４はマイクロホン１１から出力する信号を目標信号として入力される目標応答設定部で、時間ｄの遅延特性を有し、かつ、オーディオ周波数帯域でフラットな特性（ゲイン１の特性）を有している。５は減算部で、目標応答設定部４から出力する目標応答より適応フィルタ３ｂの出力信号を減算して誤差信号ｅを出力する。
【００１２】
（ｂ）人間の音声放射特性
図２は人間の音声放射特性であり、（ａ）は話者１０の口元を含む水平面において口元から所定距離の位置における音声レベルを周波数毎に示す放射特性図、（ｂ）は話者１０の口元を含む垂直面において口元から所定距離の位置における音声レベルを周波数毎に示す放射特性図である。図中、Ａは125Hz〜250Hz、Bは500Hz〜700Hz、Ｃは1400Hz〜2000Hz、Ｄは4000Hz〜5600Hzの特性である。この放射特性図より明らかなように、人間が発生する音声は、話者正面方向に最も強く放射され、上方や下方、及び左右方向に放射される音声のパワーは、話者正面方向に比べて小さい。
それゆえ、図１のように第１のマイクロホン１１を話者の顔の真上天井に配置し、第２のマイクロホン１２を第１のマイクロホン位置より１〜５cm程度後頭部側の天井に配置すれば、▲１▼２つのマイクロホン１１，１２で受音するノイズのパワーをほぼ同一にできる一方、▲２▼２つのマイクロホン１１，１２で受音する音声パワーを異ならせることができる。すなわち、２つのマイクロホン１１，１２の出力信号に含まれるノイズXn₁(z)，Xn₂(z)を略等しくできると共に、２つのマイクロホン１１，１２の出力信号に含まれる音声信号Xs₁(z)，Xs₂(z)を異ならせることができ、［Xn₁(z)/Xn₂(z)］≠[Xs₁(z)/Xs₂(z)]とすることが可能である。
【００１３】
（ｃ）動作
マイクロホン１１、１２にノイズのみが入力する非音声認識時において、適応信号処理部３は適応信号処理により次式
En(z)＝Xn₁(z)z^-d−Xn₂(z)W(z) (1)
において、{En(z)}²の平均値が最小となるように適応フィルタ３ｂのフィルタ係数Wを決定する。
一方、音声認識時、適応信号処理部３はフィルタ係数の更新をせず、前記非音声認識時に決定したフィルタ係数Wを適応フィルタ３ｂに設定して音声信号を出力する。この場合、マイクロホン１１，１２の出力に含まれる音声信号Xs₁(z)，
Xs₂(z)は異なり、［Xn₁(z)/Xn₂(z)］≠[Xs₁(z)/Xs₂(z)]となるため、次式
Es(z)＝Xs₁(z)z^-d−Xs₂(z)W(z) (2)
により求まる音声出力Es(z)は最小値には(ノイズと異なり、あまり小さくは)ならない。
以上より、(1)式のノイズ出力En(z)のパワーが0となるように適応フィルタ係数Wを決定しても、(2)式の音声出力Es(z)はノイズと同様に小さくはならず、音声信号のSN比を改善することができる。
【００１４】
以上要約すれば、図１に示すように、マイクロホン１１，１２を比較的近距離に配置することで２つのマイクロホンが出力するノイズ間のコヒーレンスの低下を少なくし、更に、図２に示すような人間の音声放射特性を考慮することで、マイクロホン１１，１２が比較的近距離に配置されているにも拘らず、１つのマイクロホン１１はできるだけ高いSN比で音声を拾い、もう一方のマイクロホン１２では、できるだけ低いSN比で音声を拾う。この結果、ノイズ出力が零となるように適応フィルタ係数Wを決定しても、音声出力はノイズと同様に小さくはならず、音声信号のSN比を改善することができる。
【００１５】
（ｄ）マイクロホン位置とSN比改善量の検討
図２の放射特性より人間が発声し空間に放射した音声は、後頭部側で特に大きく減衰し、正面に放射される音声に比べてレベルが小さくなることがわかる。それゆえ、本発明のマイクロホンシステムは、図１に示したように人間の頭部真上付近から後頭部側にかけて設置するのが基本であり、このように第１、第２のマイクロホン１１、１２を設置することによりSN比をより大きく改善することができる。
図３はペアマイクロホン位置の説明図、図４は図３の各ペアマイクロホン位置におけるSN比改善量を示す図である。図３に示すように、３cm間隔のペアマイクロホン１１，１２を複数の位置▲１▼，▲２▼，▲３▼に設置し、それぞれのSN比の改善量(どの程度SN比が向上するか)を、1500ccの乗用車(セダン)で調べてみると、図４に示す結果が得られた。この図４より、ペアマイクロホン１１，１２を▲１▼の位置に設置したとき、すなわち、「1つのマイクロホンを話者１０の顔のほぼ真上に設置し、もう1つのマイクロホンを少し離して後頭部側に配置したとき」にSN比の改善量が最も高くなることがわかる。
【００１６】
図５はペアマイクロホン間隔の説明図、図６は図５の各ペアマイクロホン間隔におけるSN比改善量を示す図である。図５に示すように、第１のマイクロホン１１を話者１０の顔のほぼ真上に固定し、第２のマイクロホン１２を後頭部側にそれぞれ３cm，６cm，９cm，１２cm離して配置し、最適なマイクロホンの間隔について調べてみると、図６に示す結果が得られた。この図６より、２つのマイクロホン１１，１２の間隔が狭いほど、SN比の改善量は高いと考えられる。しかし、図１に示したシステムでは、間隔を０cmとするとノイズを完全に消去できるが、音声もまた完全に消去してしまう。このため、音声受音システムとして機能しないことになる。また、小型マイクロホンといえどもそれ自体の大きさがあるため、マイクロホンどうしを完全にくっつけても、マイクロホンの中心間間隔は1cm程度より小さくはならない。それゆえ、マイクロホンの間隔は、車種の違いとマイクロホンの大きさにより若干の幅があるものの、せいぜい１〜５cm程度にするのがよい。
【００１７】
図７は発声者別のSN比改善量説明図である。この図７より明らかなように、本発明のマイクロホンシステムでは、人の違いによる性能(SN比改善量)のばらつきは1dB程度であり、話者の違いによる影響は少ない。
以上２つのマイクロホンを話者の頭上に配置した場合について説明したが、２つのマイクロホンを比較的近距離に配置し、1つのマイクロホンにより、できるだけ高いSN比で音声を拾い、もう一方のマイクロホンにより、できるだけ低いSN比で音声を拾うようにできれば、配置位置は頭上に限らない。
以上、本発明を実施例により説明したが、本発明は請求の範囲に記載した本発明の主旨に従い種々の変形が可能であり、本発明はこれらを排除するものではない。
【００１８】
【発明の効果】
以上本発明によれば、各マイクロホンを接近して配置すると共に、一方のマイクロホンから出力する信号のSN比を高くし、他方のマイクロホンから出力する信号のSN比を低くするようにしたから、ノイズ出力が最小となるように適応フィルタ係数を決定しても、音声出力は零とならず、音声信号のSN比を改善することができる。すなわち、少ないマイクロホン数であるにも拘らず、高いSN比で音声を受音出力することができる。
又、本発明によれば、一方のマイクロホンを話者の顔の真上天井に配置し、他方のマイクロホンを該一方のマイクロホンの位置より１〜５cm程度後頭部側に離して配置することにより、マイクロホンが比較的近距離に配置されているにも拘らず、1つのマイクロホンはできるだけ高いSN比で音声を拾い、もう一方のマイクロホンでは、できるだけ低いSN比で音声を拾うようにできる。
【図面の簡単な説明】
【図１】本発明のマイクロホンシステムの構成図である。
【図２】人間の音声放射特性図である。
【図３】ペアマイクロホンの位置説明図である。
【図４】ペアマイクロホン位置とSN比改善量の関係図である。
【図５】ペアマイクロホンの間隔説明図である。
【図６】ペアマイクロホンの間隔とSN比改善量の関係図である。
【図７】発声者別SN比改善量説明図である。
【図８】 SN比と認識率の関係図である。
【図９】従来のマイクロホンを２つ使用した場合の高SN比受音システムである。
【図１０】目標応答設定部の特性図である。
【符号の説明】
１１，１２・・第１、第２のマイクロホン
３・・適応信号処理部
３ａ・・LMS演算部
３ｂ・・適応フィルタ
４・・目標応答設定部
５・・減算部[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a microphone system, and more particularly to a microphone system that outputs a speaker voice signal with an improved SN ratio by performing adaptive signal processing using output signals of two microphones.
[0002]
[Prior art]
The current speech recognition system has reached a technical level that can achieve a recognition rate of about 95% when a signal-to-noise ratio (S: voice / N: noise) of 15 dB or more is secured. However, when the S / N ratio is reduced due to noise present in the surrounding area, the recognition rate is rapidly lowered. Fig. 8 shows the relationship between SN ratio and recognition performance for several types of microphones (omnidirectional, unidirectional, narrow directivity, AMNOR (Adaptive Microphone-array for Noise Reduction)). Ratios and recognition rates are generally contained within a band showing the S-characteristic 100. As is apparent from FIG. 8, the recognition rate rapidly decreases due to a decrease in the SN ratio, and decreases to about 50% in an environment where the SN ratio is 0 dB.
[0003]
For this reason, the above-mentioned deterioration in recognition performance is unavoidable in automobile interiors where noise generated by automobiles (engine noise, road noise, pattern noise, wind noise, etc.) is present, and a voice recognition system is mounted on the vehicle. It is one of the big problems.
In view of the circumstances described above, various methods have been proposed for receiving sound with a high S / N ratio while reducing the influence of surrounding noise, and a high S / N ratio using a plurality of microphones and digital signal processing has been proposed. A sound receiving system is an example. The simplest configuration of such a high S / N ratio receiving system is a system using two microphones as shown in FIG. 9, but other advanced systems such as Griffith-Jim type array and AMNOR are also used. A system has been proposed.
[0004]
9, 1 and 2 first, second microphone, 3 is an adaptive signal processing unit, the output signal x ₂ microphones 2 with the error signal e is input is input as a reference signal, the error signal e Adaptive signal processing is performed based on the LMS (Least Mean Square) algorithm so that the power of the signal becomes minimum. In the adaptive signal processing unit 3, 3a is an LMS calculation unit, and 3b is an adaptive filter having, for example, an FIR type digital filter configuration. The LMS calculation unit 3a determines the coefficient of the adaptive filter 3b so that the power of the error signal e is minimized by adaptive signal processing.
[0005]
Reference numeral 4 denotes a target response setting unit that receives a noise signal output from the microphone 1 as a target signal, and is used to accurately approximate the inverse characteristics of the acoustic system. When the signal delay time that is half the tap length of the adaptive filter 3b is d, the target response setting unit 4 has a delay characteristic of the time d and has a flat characteristic (gain 1 characteristic) in the audio frequency band. That is, the target response setting unit 4 has a flat frequency characteristic with a gain of 1 as shown in FIG. 10A, and has an impulse response characteristic with a delay time d as shown in FIG. 10B. . This target response setting unit 4 can be realized by setting the coefficient corresponding to the delay time d of the FIR type digital filter to 1 and setting other coefficients to 0.
A subtracting unit 5 subtracts the output signal of the adaptive filter 3b from the target response output from the target response setting unit 4, and outputs an error signal e.
[0006]
At the time of non-speech recognition, only noise is input to the microphones 1 and 2, and the adaptive signal processing unit 3 determines the filter coefficient W by adaptive signal processing so that the error signal e, that is, the power of the noise output is minimized. On the other hand, at the time of speech recognition, the adaptive signal processing unit 3 does not update the filter coefficient, sets the filter coefficient W determined at the time of non-speech recognition to the adaptive filter 3b, and outputs a speech signal.
The ideal performance originally required for the system shown in FIG. 9 is to output only the speech signal Xs (z) (noise output is 0) as an output signal during speech recognition. That is, regarding the noise output En (z),
En (z) ＝ Xn ₁ (z) z ^-d −Xn ₂ (z) W (z) (1)
It is to determine a parameter W (coefficient of the adaptive filter 3b) W that can be adjusted so that the average of {En (z)} ² takes the minimum value. However, Xn ₁ (z) and Xn ₂ (z) are noise signals included in the output signals of the microphones 1 and 2.
[0007]
[Problems to be solved by the invention]
Since there are many noise sources in automobiles, the coherence of automobile interior noise picked up by the microphones 1 and 2 tends to decrease as the microphones 1 and 2 are moved away. For this reason, as the two microphones 1 and 2 are moved away from each other, there is a problem that the condition of the expression (1) becomes difficult to be satisfied, and the microphones 1 and 2 need to be arranged as close as possible.
However, if the two microphones 1 and 2 are arranged as close as possible, there is a high possibility that almost the same sound and noise will be incident on the two microphones, and the adaptive filter coefficient W is determined so as to satisfy equation (1). If the noise is erased, even the sound is erased. On the other hand, if the adaptive filter coefficient W is determined so that the voice is not distorted, the noise hardly disappears and the SN ratio is hardly improved.
[0008]
Such a problem is not unique to the system shown in FIG. 9, but occurs in a similar manner even when a more advanced high S / N ratio receiving system such as a Griffith-Jim type array or AMNOR is adopted.
From the above, an object of the present invention is to improve the S / N ratio of an audio signal in the microphone system of FIG. 9 using two microphones.
[0009]
[Means for Solving the Problems]
According to the present invention, the above-described problem is achieved by the first and second microphones, the target signal output unit that receives the signal output from the first microphone and outputs the target signal, and the output from the second microphone. An adaptive signal processing unit for performing adaptive signal processing by inputting the received signal as a reference signal, and an error signal generating unit for outputting a difference between the output signal of the adaptive filter constituting the adaptive signal processing unit and the target signal In the in-vehicle microphone system, the first and second microphones are disposed close to each other, the first microphone is disposed on the ceiling of the cabin directly above the speaker's face, and the second microphone is disposed on the first microphone. by spaced apart from the position of the first microphone in the vehicle compartment ceiling 1~5cm about occipital side, the first microphone has a large speaker speech than the second microphone, and wherein The noise other than the speaker voice in the car with respect to the speaker voice is reduced, the speaker voice of the second microphone is smaller than that of the first microphone, and the voice other than the speaker voice in the car with respect to the speaker voice The adaptive signal processing unit is configured to reduce the power of the error signal output from the error signal generation unit by adaptive signal processing during non-speech recognition when only noise is input to the first and second microphones. The coefficient of the adaptive filter is determined so as to be the minimum, and when the noise and the voice signal are input to the first and second microphones, the coefficient of the adaptive filter is not updated and the error signal is converted into the voice signal. Is achieved by outputting as
In this way, the noises Xn ₁ (z) and Xn ₂ (z) included in the output signals of the two microphones can be made substantially equal, while the audio signal Xs ₁ (z) included in the output signals of the two microphones. , Xs ₂ (z) can be made different. Therefore, even if the adaptive filter coefficient W is determined so that the root mean square value of En (z) at the time of noise signal input is minimized, the audio output Es (z) in equation (2) does not become zero, The signal-to-noise ratio of the signal can be improved.
[0010]
As a specific arrangement example of the microphone, one microphone is arranged right above the speaker's face, and the other microphone is arranged about 1 to 5 cm away from the position of the one microphone on the back of the head. In this way, due to the human sound radiation characteristics, one microphone picks up the sound with the highest possible signal-to-noise ratio, even though the microphones are located relatively close, and the other microphone has the lowest possible The sound can be picked up by SN ratio.
[0011]
DETAILED DESCRIPTION OF THE INVENTION
(A) Configuration of Microphone System FIG. 1 is a configuration diagram of a microphone system according to the present invention, and the same components as those in the system of FIG. In the figure, 10 is a speaker, for example, a car driver, and 11 and 12 are first and second microphones. The first microphone 11 is arranged on the ceiling directly above the speaker's face, and the second microphone 12 is arranged on the ceiling on the back of the head about 1 to 5 cm from the first microphone position.
3 is adaptive signal processing unit, the output signal x ₂ microphones 2 with the error signal e is input is input as a reference signal, the adaptive signal processing based on the LMS algorithm so that the power becomes a minimum error signal e Do. In the adaptive signal processing unit 3, 3a is an LMS calculation unit, and 3b is an adaptive filter having an FIR digital filter configuration. The LMS calculation unit 3a determines the coefficient of the adaptive filter 3b so that the power of the error signal e is minimized by adaptive signal processing. The adaptive signal processing unit 3 determines the filter coefficient W of the adaptive filter 3b by adaptive signal processing only during non-speech recognition, does not update the filter coefficient during speech recognition, and uses the filter coefficient W determined during non-speech recognition. Set to adaptive filter 3b.
Reference numeral 4 denotes a target response setting unit which receives a signal output from the microphone 11 as a target signal, has a delay characteristic of time d, and has a flat characteristic (a characteristic of gain 1) in the audio frequency band. . A subtracting unit 5 subtracts the output signal of the adaptive filter 3b from the target response output from the target response setting unit 4, and outputs an error signal e.
[0012]
(B) Human voice radiation characteristics FIG. 2 shows human voice radiation characteristics, and (a) is a radiation characteristic chart showing the voice level at a predetermined distance from the mouth for each frequency on a horizontal plane including the mouth of the speaker 10, (B) is a radiation characteristic diagram showing the sound level at a predetermined distance from the mouth for each frequency on a vertical plane including the mouth of the speaker 10. In the figure, A is a characteristic of 125 Hz to 250 Hz, B is a characteristic of 500 Hz to 700 Hz, C is a characteristic of 1400 Hz to 2000 Hz, and D is a characteristic of 4000 Hz to 5600 Hz. As is clear from this radiation characteristic diagram, human-generated speech is radiated most strongly in the direction of the speaker's front, and the power of the sound radiated upward, downward, and in the left-right direction is higher than that in the direction of the speaker's front. small.
Therefore, as shown in FIG. 1, the first microphone 11 is arranged on the ceiling directly above the speaker's face, and the second microphone 12 is arranged on the ceiling on the back of the head about 1 to 5 cm from the first microphone position. (1) While the power of noise received by the two microphones 11 and 12 can be made substantially the same, (2) the sound power received by the two microphones 11 and 12 can be made different. That is, the noises Xn ₁ (z) and Xn ₂ (z) included in the output signals of the two microphones 11 and 12 can be made substantially equal, and the audio signal Xs ₁ (z) included in the output signals of the two microphones 11 and 12 ), Xs ₂ (z) can be made different, and [Xn ₁ (z) / Xn ₂ (z)] ≠ [Xs ₁ (z) / Xs ₂ (z)].
[0013]
(C) At the time of non-speech recognition in which only noise is input to the operating microphones 11 and 12, the adaptive signal processing unit 3 performs the following equation by adaptive signal processing:
En (z) ＝ Xn ₁ (z) z ^-d −Xn ₂ (z) W (z) (1)
, The filter coefficient W of the adaptive filter 3b is determined so that the average value of {En (z)} ² is minimized.
On the other hand, at the time of speech recognition, the adaptive signal processing unit 3 does not update the filter coefficient, sets the filter coefficient W determined at the time of non-speech recognition to the adaptive filter 3b, and outputs a speech signal. In this case, the audio signal Xs ₁ (z) included in the outputs of the microphones 11 and 12,
Xs ₂ (z) is different and [Xn ₁ (z) / Xn ₂ (z)] ≠ [Xs ₁ (z) / Xs ₂ (z)].
Es (z) ＝ Xs ₁ (z) z ^-d −Xs ₂ (z) W (z) (2)
The audio output Es (z) obtained by the above does not become the minimum value (unlike noise, it is not so small).
From the above, even if the adaptive filter coefficient W is determined so that the power of the noise output En (z) in equation (1) is 0, the audio output Es (z) in equation (2) is not as small as noise. In other words, the S / N ratio of the audio signal can be improved.
[0014]
In summary, as shown in FIG. 1, by arranging the microphones 11 and 12 at a relatively short distance, the reduction in coherence between the noises output by the two microphones is reduced. Further, as shown in FIG. Considering the human sound radiation characteristics, one microphone 11 picks up sound with as high an S / N ratio as possible even though the microphones 11 and 12 are arranged at a relatively short distance, and the other microphone 12 Pick up the audio with the lowest possible signal-to-noise ratio. As a result, even if the adaptive filter coefficient W is determined so that the noise output becomes zero, the audio output does not become small like the noise, and the SN ratio of the audio signal can be improved.
[0015]
(D) Examination of microphone position and S / N ratio improvement amount The sound uttered by humans and radiated into the space from the radiation characteristics of FIG. 2 is particularly greatly attenuated on the back of the head, and the level becomes smaller than the sound radiated to the front. I understand that. Therefore, as shown in FIG. 1, the microphone system of the present invention is basically installed from near the human head to the back of the head. Thus, the first and second microphones 11 and 12 are installed in this way. By installing it, the SN ratio can be greatly improved.
FIG. 3 is an explanatory diagram of the position of the pair microphone, and FIG. 4 is a diagram showing the SN ratio improvement amount at each pair microphone position of FIG. As shown in FIG. 3, pair microphones 11 and 12 with a spacing of 3 cm are installed at a plurality of positions {circle around (1)}, {circle over (2)}, {circle around (3)}, and each SN ratio improvement amount (how much the SN ratio is improved). ) With a 1500cc passenger car (sedan), the results shown in FIG. 4 were obtained. From FIG. 4, when the pair microphones 11 and 12 are installed at the position (1), that is, “one microphone is installed almost directly above the face of the speaker 10, and the other microphone is separated a little from the back of the head. It can be seen that the amount of improvement in the S / N ratio is the highest when it is arranged on the side.
[0016]
FIG. 5 is an explanatory diagram of the pair microphone interval, and FIG. 6 is a diagram showing the SN ratio improvement amount at each pair microphone interval of FIG. As shown in FIG. 5, the first microphone 11 is fixed almost right above the face of the speaker 10, and the second microphone 12 is placed 3 cm, 6 cm, 9 cm, and 12 cm apart from each other on the back of the head, so that the optimum When the distance between the microphones was examined, the result shown in FIG. 6 was obtained. From FIG. 6, it can be considered that the smaller the distance between the two microphones 11 and 12, the higher the improvement in the S / N ratio. However, in the system shown in FIG. 1, if the interval is set to 0 cm, the noise can be completely erased, but the sound is also completely erased. For this reason, it does not function as a voice receiving system. In addition, even a small microphone has its own size, so even if the microphones are completely attached to each other, the distance between the centers of the microphones cannot be smaller than about 1 cm. Therefore, the distance between the microphones is preferably about 1 to 5 cm at most, although there is a slight width depending on the difference in the vehicle type and the size of the microphone.
[0017]
FIG. 7 is an explanatory diagram of the SN ratio improvement amount for each speaker. As can be seen from FIG. 7, in the microphone system of the present invention, the variation in performance (SN ratio improvement amount) due to the difference between persons is about 1 dB, and the influence due to the difference between speakers is small.
The case where the two microphones are arranged above the speaker's head has been described above, but the two microphones are arranged at a relatively short distance, one microphone picks up the sound with the highest possible SN ratio, and the other microphone If the voice can be picked up with a signal-to-noise ratio as low as possible, the arrangement position is not limited to overhead.
The present invention has been described with reference to the embodiments. However, the present invention can be variously modified in accordance with the gist of the present invention described in the claims, and the present invention does not exclude these.
[0018]
【The invention's effect】
As described above, according to the present invention, the microphones are arranged close to each other, the SN ratio of the signal output from one microphone is increased, and the SN ratio of the signal output from the other microphone is decreased. Even if the adaptive filter coefficient is determined so that the output is minimized, the audio output does not become zero, and the SN ratio of the audio signal can be improved. That is, although the number of microphones is small, sound can be received and output with a high SN ratio.
Also, according to the present invention, one microphone is disposed on the ceiling directly above the speaker's face, and the other microphone is disposed about 1 to 5 cm away from the position of the one microphone on the back of the head, thereby Although one microphone is picked up with a signal-to-noise ratio as high as possible, one microphone can pick up sound with a signal-to-noise ratio as low as possible.
[Brief description of the drawings]
FIG. 1 is a configuration diagram of a microphone system of the present invention.
FIG. 2 is a human radiation characteristic diagram.
FIG. 3 is a diagram illustrating the position of a pair microphone.
FIG. 4 is a relationship diagram between a pair microphone position and an SN ratio improvement amount.
FIG. 5 is an explanatory diagram of a pair microphone interval;
FIG. 6 is a relationship diagram between a pair microphone interval and an SN ratio improvement amount.
FIG. 7 is an explanatory diagram of an SN ratio improvement amount for each speaker.
FIG. 8 is a relationship diagram between SN ratio and recognition rate.
FIG. 9 shows a high S / N ratio sound receiving system in the case of using two conventional microphones.
FIG. 10 is a characteristic diagram of a target response setting unit.
[Explanation of symbols]
11, 12... First and second microphones 3... Adaptive signal processing unit 3 a... LMS computing unit 3 b.

Claims

The first and second microphones, a signal output from the first microphone, a target signal output unit that outputs a target signal, and a signal output from the second microphone are input as reference signals. In-vehicle microphone system comprising an adaptive signal processing unit that performs adaptive signal processing and an error signal generation unit that outputs a difference between an output signal of an adaptive filter that constitutes the adaptive signal processing unit and the target signal,
The first and second microphones are disposed close to each other, the first microphone is disposed on the ceiling of the passenger compartment and directly above the speaker's face, and the second microphone is positioned 1 from the position of the first microphone. By placing the first microphone about 5 cm away from the ceiling on the back of the passenger compartment , the first microphone has a louder speaker voice than the second microphone, and noise other than the speaker voice in the car relative to the speaker voice. The second microphone has a lower speaker voice than the first microphone, and a noise other than the speaker voice in the car corresponding to the speaker voice is increased.
The adaptive signal processing unit is adapted to minimize the power of the error signal output from the error signal generation unit by adaptive signal processing during non-speech recognition in which only noise is input to the first and second microphones. A coefficient of a filter is determined, and at the time of voice recognition in which noise and a voice signal are input to the first and second microphones, the error signal is output as a voice signal without updating the coefficient of the adaptive filter;
A microphone system characterized by this.