JP2001242888A

JP2001242888A - Speech recognition system, speech recognition method and recording medium

Info

Publication number: JP2001242888A
Application number: JP2000057941A
Authority: JP
Inventors: Yoshinaga Kato; 喜永加藤
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2000-02-29
Filing date: 2000-02-29
Publication date: 2001-09-07
Anticipated expiration: 2020-02-29
Also published as: JP4201455B2

Abstract

PROBLEM TO BE SOLVED: To precisely conduct speech recognition even if background noise environment varies while voice is inputted from a communication terminal. SOLUTION: The system is provided with a communication terminal 1 and a storage means 2 which is connected to the terminal 1 through a communication network 200. The terminal 1 has a voice inputting means 11 through which voice and environmental noise are inputted, a feature extracting means 12 which extracts feature vector of voice inputted from the means 11, a speech recognition means 13 which recognizes voice by collating the feature vector of the voice and prescribed recognition model parameters and a noise verifying means 14 which verifies environmental noise inputted from the means 11. When the means 14 detects enviromental noise, the means 14 selects a recognition model parameter in accordance with the kind of the verified noise from the multiple recognition model parameters stored in the means 2. Then, the means 13 conducts speech recognition using the recognition model parameter selected by the means 14.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、音声認識システム
および音声認識方法および記録媒体に関する。The present invention relates to a voice recognition system, a voice recognition method, and a recording medium.

【０００２】[0002]

【従来の技術】一般に、移動用通信端末から入力された
音声を認識する場合、通信網に接続された音声認識装置
を用いるため、移動用通信端末からの入力音声は、通信
網を介して音声認識装置に伝送されてくる。ところで，
移動用通信端末は、道路わき、駅のホームなどの様々な
場所で使用される。そのため、移動用通信端末から入力
された音声は、たとえ同一の使用者であっても、様々な
使用環境により異なる背景騒音の影響を受けている。さ
らに、移動用通信端末からの音声は通信網を介して送ら
れてくるので、通信雑音に対する影響も受ける。通信経
路は一定とは限らないため、従来では、通信路を伝送し
た音声を認識した場合、背景騒音と通信路の雑音とが複
雑に影響して、音声認識の性能が低下するという問題が
あった。2. Description of the Related Art Generally, when recognizing voice input from a mobile communication terminal, a voice recognition device connected to a communication network is used. Therefore, input voice from the mobile communication terminal is transmitted via the communication network. It is transmitted to the recognition device. by the way,
The mobile communication terminal is used in various places such as a roadside and a platform of a station. For this reason, the voice input from the mobile communication terminal is affected by different background noises depending on various usage environments, even for the same user. Further, since the voice from the mobile communication terminal is transmitted via the communication network, the voice is also affected by communication noise. Conventionally, the communication path is not always constant.Therefore, conventionally, when speech transmitted through the communication path is recognized, there is a problem that the background noise and the noise of the communication path are complicatedly affected, and the performance of the speech recognition is reduced. Was.

【０００３】[0003]

【発明が解決しようとする課題】このような問題を解決
するため、例えば特開平１０−２８２９９０号には、使
用者の音声を音声認識装置に学習させることにより、認
識精度の向上を図る技術が示されている。In order to solve such a problem, for example, Japanese Patent Application Laid-Open No. 10-282990 discloses a technique for improving the recognition accuracy by making a voice recognition device learn a user's voice. It is shown.

【０００４】しかしながら、上述した従来の技術によっ
ても、背景騒音や通信経路の変化により、移動用通信端
末から入力された音声の波形が影響を受けた場合、認識
精度が劣化してしまうという問題がある。[0004] However, even with the above-mentioned conventional technology, there is a problem that the recognition accuracy is deteriorated when the waveform of the voice input from the mobile communication terminal is affected by the background noise or the change of the communication path. is there.

【０００５】本発明は、移動用通信端末などの通信端末
からの音声入力時の背景騒音環境が変化する様々な場面
においても、精度よく音声認識を行うことの可能な音声
認識システムおよび音声認識方法および記録媒体を提供
することを目的としている。The present invention provides a voice recognition system and a voice recognition method capable of performing voice recognition with high accuracy even in various scenes where the background noise environment changes when voice is input from a communication terminal such as a mobile communication terminal. And a recording medium.

【０００６】[0006]

【課題を解決するための手段】上記目的を達成するため
に、請求項１記載の発明は、通信端末と、前記通信端末
と通信網を介して接続される記憶手段とを備え、前記通
信端末は、音声および／または周囲の騒音が入力される
音声入力手段と、音声入力手段から入力された音声の特
徴量を抽出する特徴抽出手段と、前記音声の特徴量を所
定の認識モデルパラメータセットと照合することにより
音声を認識する音声認識手段と、前記音声入力手段から
入力された周囲の騒音を検証する騒音検証手段とを有
し、前記騒音検証手段は、周囲の騒音を検証したとき
に、検証した騒音の種類に応じた認識モデルパラメータ
セットを記憶手段に記憶されている複数の認識モデルパ
ラメータセットの中から選択し、前記音声認識手段は、
前記騒音検証手段によって選択された認識モデルパラメ
ータセットを用いて、音声認識を行なうようになってい
ることを特徴としている。According to a first aspect of the present invention, there is provided a communication terminal comprising: a communication terminal; a storage unit connected to the communication terminal via a communication network; A voice input unit to which a voice and / or ambient noise is input; a feature extraction unit that extracts a feature amount of the voice input from the voice input unit; a predetermined recognition model parameter set; Speech recognition means for recognizing a voice by collating, and noise verification means for verifying the surrounding noise input from the voice input means, the noise verification means, when verifying the surrounding noise, A recognition model parameter set corresponding to the type of the verified noise is selected from a plurality of recognition model parameter sets stored in the storage unit, and the voice recognition unit includes:
The speech recognition is performed using the recognition model parameter set selected by the noise verification unit.

【０００７】また、請求項２記載の発明は、通信端末
と、前記通信端末と通信網を介して接続される音声認識
手段と、前記通信端末と通信網を介して接続される記憶
手段とを備え、前記通信端末は、音声および／または周
囲の騒音が入力される音声入力手段と、音声入力手段か
ら入力された音声の特徴量を抽出する特徴抽出手段と、
前記音声入力手段から入力された周囲の騒音を検証する
騒音検証手段とを有し、前記騒音検証手段は、周囲の騒
音を検証したときに、検証した騒音の種類に応じた認識
モデルパラメータセットを記憶手段に記憶されている複
数の認識モデルパラメータセットの中から選択し、前記
音声認識手段は、通信端末の特徴抽出手段によって抽出
された音声の特徴量が通信端末から送られたときに、送
られた音声の特徴量を前記騒音検証手段によって選択さ
れた認識モデルパラメータセットと照合することにより
音声認識を行なうようになっていることを特徴としてい
る。According to a second aspect of the present invention, there is provided a communication terminal, a voice recognition unit connected to the communication terminal via a communication network, and a storage unit connected to the communication terminal via a communication network. The communication terminal includes: a voice input unit to which voice and / or ambient noise is input; a feature extraction unit to extract a feature amount of the voice input from the voice input unit;
Noise verification means for verifying ambient noise input from the voice input means, wherein the noise verification means, when verifying the surrounding noise, a recognition model parameter set corresponding to the type of verified noise. Selecting from among a plurality of recognition model parameter sets stored in a storage unit, the speech recognition unit transmits when a feature amount of the speech extracted by the feature extraction unit of the communication terminal is sent from the communication terminal; The speech recognition is performed by collating the feature amount of the speech with the recognition model parameter set selected by the noise verification unit.

【０００８】また、請求項３記載の発明は、通信端末
と、前記通信端末と通信網を介して接続される音声認識
手段と、前記通信端末と通信網を介して接続される騒音
検証手段と、前記通信端末と通信網を介して接続される
記憶手段とを備え、前記通信端末は、音声および／また
は周囲の騒音が入力される音声入力手段と、音声入力手
段から入力された音声の特徴量を抽出する特徴抽出手段
とを有し、前記騒音検証手段は、前記音声入力手段から
入力された周囲の騒音を検証したときに、検証した騒音
の種類に応じた認識モデルパラメータセットを記憶手段
に記憶されている複数の認識パラメータセットの中から
選択し、前記音声認識手段は、通信端末の特徴抽出手段
によって抽出された音声の特徴量が通信端末から送られ
たときに、送られた音声の特徴量を前記騒音検証手段に
よって選択された認識モデルパラメータセットと照合す
ることにより音声認識を行なうようになっていることを
特徴としている。According to a third aspect of the present invention, there is provided a communication terminal, a voice recognition unit connected to the communication terminal via a communication network, and a noise verification unit connected to the communication terminal via a communication network. Storage means connected to the communication terminal via a communication network, wherein the communication terminal is characterized by voice input means for inputting voice and / or ambient noise, and characteristics of voice input from the voice input means. Characteristic extracting means for extracting an amount of noise, wherein the noise verifying means stores a recognition model parameter set corresponding to the type of the verified noise when verifying ambient noise input from the voice input means. Selected from among a plurality of recognition parameter sets stored in the communication terminal, the voice recognition unit transmits the voice feature amount extracted by the feature extraction unit of the communication terminal when the feature amount is transmitted from the communication terminal. Is characterized by being adapted to perform speech recognition by matching a recognition model parameter set the feature quantity of voice selected by the noise verification means.

【０００９】また、請求項４記載の発明は、通信端末
と、前記通信端末と通信網を介して接続される音声認識
手段と、前記通信端末と通信網を介して接続される騒音
検証手段と、前記通信端末と通信網を介して接続される
特徴抽出手段と、前記通信端末と通信網を介して接続さ
れる記憶手段とを備え、前記通信端末は、音声および／
または周囲の騒音が入力される音声入力手段を有し、前
記特徴抽出手段は、通信端末の音声入力手段で入力され
た音声が通信端末から送られるときに、入力された音声
の特徴量を抽出し、前記騒音検証手段は、通信端末の音
声入力手段で入力された周囲の騒音が通信端末から送ら
れるときに、周囲の騒音を検証し、検証した騒音の種類
に応じた認識パラメータセットを記憶手段に記憶されて
いる複数の認識パラメータセットの中から選択し、前記
音声認識手段は、特徴抽出手段によって抽出された音声
の特徴量を前記騒音検証手段によって選択された認識モ
デルパラメータセットと照合することにより音声認識を
行なうようになっていることを特徴としている。According to a fourth aspect of the present invention, there is provided a communication terminal, a voice recognition unit connected to the communication terminal via a communication network, and a noise verification unit connected to the communication terminal via a communication network. , A feature extraction unit connected to the communication terminal via a communication network, and a storage unit connected to the communication terminal via a communication network, wherein the communication terminal includes
Or a voice input unit to which ambient noise is input, wherein the feature extraction unit extracts a feature amount of the input voice when the voice input by the voice input unit of the communication terminal is transmitted from the communication terminal. The noise verification unit verifies the surrounding noise when the surrounding noise input by the voice input unit of the communication terminal is transmitted from the communication terminal, and stores a recognition parameter set corresponding to the type of the verified noise. Selecting from a plurality of recognition parameter sets stored in the means, wherein the voice recognition means checks a feature amount of the voice extracted by the feature extraction means with a recognition model parameter set selected by the noise verification means. This is characterized by performing speech recognition.

【００１０】また、請求項５記載の発明は、請求項１乃
至請求項４のいずれか一項に記載の音声認識システムに
おいて、前記通信端末は、移動用通信端末であることを
特徴としている。According to a fifth aspect of the present invention, in the voice recognition system according to any one of the first to fourth aspects, the communication terminal is a mobile communication terminal.

【００１１】また、請求項６記載の発明は、通信端末か
ら入力した音声の特徴量を抽出する特徴抽出手段と、前
記特徴量より音声を認識する音声認識手段と、前記通信
端末から入力した周囲の騒音を検証する騒音検証手段
と、前記通信端末と通信網を介して接続された記憶手段
とを備え、前記音声認識手段は、前記記憶手段に記憶さ
れている複数の認識モデルパラメータセットの中から前
記騒音検証手段を用いて選択された認識モデルパラメー
タセットを用いて、音声認識を行うことを特徴としてい
る。According to a sixth aspect of the present invention, there is provided a feature extracting unit for extracting a feature amount of a voice input from a communication terminal, a voice recognizing unit for recognizing a voice from the feature amount, and a peripheral unit input from the communication terminal. Noise verification means for verifying the noise of the communication terminal, and storage means connected to the communication terminal via a communication network, wherein the voice recognition means includes a plurality of recognition model parameter sets stored in the storage means. The speech recognition is performed using a recognition model parameter set selected by using the noise verification means.

【００１２】また、請求項７記載の発明は、請求項６記
載の音声認識方法において、前記音声認識手段に記憶さ
れている認識モデルパラメータセットを、通信網を介し
て前記記憶手段に転送することを特徴としている。According to a seventh aspect of the present invention, in the voice recognition method according to the sixth aspect, the recognition model parameter set stored in the voice recognition means is transferred to the storage means via a communication network. It is characterized by.

【００１３】また、請求項８記載の発明は、請求項６ま
たは請求項７記載の音声認識方法において、特徴抽出手
段，音声認識手段，騒音検証手段を前記通信端末に備
え、前記騒音検証手段を用いて前記記憶手段に記憶され
ている認識モデルパラメータセットを選択し、選択され
た認識モデルパラメータセットを通信網を介して前記通
信端末に取り込み、音声認識手段に記憶することを特徴
としている。According to an eighth aspect of the present invention, in the voice recognition method according to the sixth or seventh aspect, the communication terminal includes a feature extracting unit, a voice recognizing unit, and a noise verifying unit, and the noise verifying unit includes: A recognition model parameter set stored in the storage means is selected, the selected recognition model parameter set is taken into the communication terminal via a communication network, and stored in the voice recognition means.

【００１４】また、請求項９記載の発明は、請求項６記
載の音声認識方法において、特徴抽出手段と騒音検証手
段とを前記通信端末に備え、また、通信端末と通信網を
介して音声認識手段を備え、前記騒音検証手段を用いて
前記記憶手段に記憶されている認識モデルパラメータセ
ットを選択し、選択された認識モデルパラメータセット
を音声認識手段に用いることを特徴としている。According to a ninth aspect of the present invention, in the voice recognition method of the sixth aspect, the communication terminal includes a feature extracting unit and a noise verifying unit, and the voice recognition is performed via the communication terminal and the communication network. Means for selecting a recognition model parameter set stored in the storage means using the noise verification means, and using the selected recognition model parameter set for the speech recognition means.

【００１５】また、請求項１０記載の発明は、請求項６
記載の音声認識方法において、特徴抽出手段を前記通信
端末に備え、また、通信端末と通信網を介して騒音検証
手段，音声認識手段を備え、前記騒音検証手段を用いて
前記記憶手段に記憶されている認識モデルパラメータセ
ットを選択し、選択された認識モデルパラメータセット
を音声認識手段に用いることを特徴としている。The invention according to claim 10 is the same as the invention according to claim 6.
In the speech recognition method described above, a feature extraction unit is provided in the communication terminal, and a noise verification unit and a voice recognition unit are provided via the communication terminal and a communication network, and stored in the storage unit using the noise verification unit. The selected recognition model parameter set is selected, and the selected recognition model parameter set is used for speech recognition means.

【００１６】また、請求項１１記載の発明は、請求項６
記載の音声認識方法において、通信端末と通信網を介し
て特徴抽出手段，騒音検証手段，音声認識手段を備え、
前記騒音検証手段を用いて前記記憶手段に記憶されてい
る認識モデルパラメータセットを選択し、選択された認
識モデルパラメータセットを音声認識手段に用いること
を特徴としている。The invention according to claim 11 is the same as the invention according to claim 6.
The speech recognition method according to the above, further comprising a feature extraction unit, a noise verification unit, and a speech recognition unit via a communication terminal and a communication network.
A recognition model parameter set stored in the storage unit is selected using the noise verification unit, and the selected recognition model parameter set is used for a speech recognition unit.

【００１７】また、請求項１２記載の発明は、請求項６
乃至請求項１１のいずれか一項に記載の音声認識方法に
おいて、騒音検証手段の検証モデルと音声認識手段およ
び／または記憶手段の認識モデルパラメータを入力音声
特徴量を用いて更新することを特徴としている。The invention according to claim 12 is the invention according to claim 6.
12. The voice recognition method according to claim 11, wherein the verification model of the noise verification unit and the recognition model parameters of the voice recognition unit and / or the storage unit are updated using the input voice feature amount. I have.

【００１８】また、請求項１３記載の発明は、請求項６
乃至請求項１１のいずれか一項に記載の音声認識方法に
おいて、騒音検証手段の検証モデルと記憶手段の認識モ
デルパラメータセットとを追加可能であって、追加され
た検証モデルと追加された認識モデルパラメータセット
とを、入力音声特徴量を用いて更新することを特徴とし
ている。The invention according to claim 13 provides the invention according to claim 6
12. The speech recognition method according to claim 11, wherein a verification model of the noise verification unit and a recognition model parameter set of the storage unit can be added, and the added verification model and the added recognition model are provided. It is characterized in that the parameter set is updated using the input speech feature amount.

【００１９】また、請求項１４記載の発明は、周囲の騒
音を検証したときに、検証した騒音の種類に応じた認識
モデルパラメータセットを記憶手段に記憶されている複
数の認識モデルパラメータセットの中から選択し、選択
された認識モデルパラメータセットを用いて、音声認識
を行なう処理をコンピュータに実行させるためのプログ
ラムを記録したコンピュータ読取可能な記録媒体であ
る。According to a fourteenth aspect of the present invention, when the surrounding noise is verified, a plurality of recognition model parameter sets corresponding to the type of the verified noise are stored in the storage means. And a computer-readable recording medium on which a program for causing a computer to execute a process of performing speech recognition using the selected recognition model parameter set is recorded.

【００２０】[0020]

【発明の実施の形態】以下、本発明の実施形態を図面に
基づいて説明する。図１は本発明に係る音声認識システ
ムの第１の実施形態の構成例を示す図である。図１を参
照すると、この音声認識システムは、通信端末（例え
ば、携帯電話などの移動用通信端末）１と、通信端末１
と通信網２００を介して接続された記憶手段２とを備え
ている。Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 is a diagram showing a configuration example of a first embodiment of a speech recognition system according to the present invention. Referring to FIG. 1, the voice recognition system includes a communication terminal (for example, a mobile communication terminal such as a mobile phone) 1 and a communication terminal 1.
And a storage means 2 connected via a communication network 200.

【００２１】ここで、通信端末１は、音声および／また
は周囲の騒音が入力される音声入力手段１１と、音声入
力手段１１から入力された音声の特徴量を抽出する特徴
抽出手段１２と、前記音声の特徴量を所定の認識モデル
パラメータセットと照合することにより音声を認識する
音声認識手段１３と、前記音声入力装置１１から入力さ
れた周囲の騒音を検証する騒音検証手段１４と、通信端
末中央処理装置１５とを有している。Here, the communication terminal 1 includes a voice input unit 11 to which voice and / or ambient noise is input, a feature extraction unit 12 to extract a feature amount of the voice input from the voice input unit 11, A voice recognition unit 13 for recognizing the voice by comparing a feature amount of the voice with a predetermined recognition model parameter set; a noise verification unit 14 for verifying the ambient noise input from the voice input device 11; And a processing device 15.

【００２２】特徴抽出手段１２には、よく知られたＬＰ
Ｃ（線形予測）分析などを用いることができる。例え
ば、分析条件を、標本化周波数：８ｋＨｚ、高域強調：
一次差分、２５６点ハミング窓、移動幅：１６ｍｓ、Ｌ
ＰＣ分析次数：２０とし、１０次元メルケプストラム係
数＋対数パワーの一次差分＋対数パワーという特徴量を
フレーム単位で抽出するものを用いることができる。な
お、音声の分析は、上記のものに限られたものではな
く、周波数分析など他のどのような手法を用いてもよ
い。The feature extracting means 12 includes a well-known LP
C (linear prediction) analysis or the like can be used. For example, the analysis conditions are as follows: sampling frequency: 8 kHz, high-frequency emphasis:
Primary difference, 256-point Hamming window, moving width: 16 ms, L
The PC analysis order is set to 20, and one that extracts a feature amount of 10-dimensional mel-cepstral coefficient + first-order difference of logarithmic power + logarithmic power for each frame can be used. Note that the voice analysis is not limited to the above, and any other method such as frequency analysis may be used.

【００２３】また、記憶手段２には、複数の認識モデル
パラメータセットＰ１，…，Ｐｎが記憶されている。す
なわち、記憶手段２には、各騒音の環境下で訓練された
認識モデルパラメータセットＰ１，…，Ｐｎが記憶され
ている。The storage means 2 stores a plurality of recognition model parameter sets P1,..., Pn. That is, the storage unit 2 stores the recognition model parameter sets P1,..., Pn trained in each noise environment.

【００２４】そして、騒音検証手段１４は、周囲の騒音
を検証したときに、検証した騒音の種類に応じた認識モ
デルパラメータセットを記憶手段２に記憶されている複
数の認識モデルパラメータセットＰ１，…，Ｐｎの中か
ら選択し、前記音声認識手段１３は、前記騒音検証手段
１４によって選択された認識モデルパラメータセットＰ
を用いて、音声認識を行なうようになっている。Then, the noise verifying means 14, when verifying ambient noise, stores a plurality of recognition model parameter sets P1,... P1,. , Pn, and the voice recognition unit 13 selects the recognition model parameter set P selected by the noise verification unit 14.
Is used to perform voice recognition.

【００２５】図２は騒音検証手段１４の構成例を示す図
である。図２を参照すると、騒音検証手段１４には、騒
音の種類ごとに用意されている検証モデルＱ１，…，Ｑ
ｎと、特徴抽出手段１２によって抽出された背景騒音の
特徴量と各検証モデルＱ１，…，Ｑｎとを比較する比較
器１６とが設けられている。ここで、各検証モデルＱ
１，…，Ｑｎは、各騒音の代表パターンや、ＨＭＭなど
のモデルを使って予め作成されている。FIG. 2 is a diagram showing an example of the configuration of the noise verification means 14. Referring to FIG. 2, noise verification means 14 includes verification models Q1,..., Q prepared for each type of noise.
, Qn, and a comparison value of the background noise extracted by the feature extraction means 12 with each of the verification models Q1,..., Qn. Here, each verification model Q
1,..., Qn are created in advance using a representative pattern of each noise or a model such as an HMM.

【００２６】なお、ここで、記憶手段２に記憶されてい
る認識モデルパラメータセットＰ１，…，Ｐｎの番号１
〜ｎは、騒音検証手段１４の検証モデルＱ１，…，Ｑｎ
の番号１〜ｎと対応付けられている。Here, the number 1 of the recognition model parameter sets P1,..., Pn stored in the storage means 2
To n are verification models Q1,..., Qn of the noise verification means 14.
Nos. 1 to n.

【００２７】次に、このような構成の図１の音声認識シ
ステムの処理動作について説明する。通信端末１の音声
入力手段（例えば携帯電話のマイク）１１から音声が入
力されると、特徴抽出手段１２により、入力音声の特徴
量が抽出される。Next, the processing operation of the speech recognition system having such a configuration shown in FIG. 1 will be described. When a voice is input from a voice input unit (for example, a microphone of a mobile phone) 11 of the communication terminal 1, a feature amount of the input voice is extracted by a feature extraction unit 12.

【００２８】音声認識を行う場合には、スイッチＳＷ１
を音声認識手段１３側Ａに入れる。認識モデルパラメー
タセットＰは、通信端末１の記憶領域に保存されてお
り、比較対象となる全ての認識単位（例えば音素）のモ
デルを表現できるパラメータとして記憶されている。こ
れらのパラメータは、認識モデルを表現できればどのよ
うなものを用いてもよく、よく知られたＨＭＭ（隠れマ
ルコフモデル）などの確率モデルのパラメータや、照合
対象の代表パターンなどを用いることができる。When performing voice recognition, the switch SW1
Into the voice recognition means 13 side A. The recognition model parameter set P is stored in a storage area of the communication terminal 1 and is stored as a parameter capable of expressing a model of all recognition units (for example, phonemes) to be compared. As these parameters, any parameters can be used as long as they can represent a recognition model, and parameters of a well-known probability model such as an HMM (Hidden Markov Model), a representative pattern to be collated, and the like can be used.

【００２９】音声認識手段１３では、通信端末１の記憶
領域に現在保存されている認識モデルパラメータＰと特
徴抽出手段１２によって抽出された音声特徴量とを比較
し、この比較結果を通信端末中央処理装置１５に与え、
通信端末中央処理装置１５では、アプリケーションに応
じた処理が行われる。例えば、使用者が発声した相手先
の名前を音声認識して、この通信端末に登録されている
電話番号にダイアルするといったような処理が行われ
る。The speech recognition means 13 compares the recognition model parameter P currently stored in the storage area of the communication terminal 1 with the speech feature quantity extracted by the feature extraction means 12, and compares the comparison result with the communication terminal central processing unit. Given to device 15,
The communication terminal central processing unit 15 performs a process according to the application. For example, processing such as voice recognition of the name of the other party uttered by the user and dialing a telephone number registered in the communication terminal is performed.

【００３０】ところで、上記のような音声認識を行う場
合、使用場所により背景騒音の種類が変化するので、誤
認識が頻繁に発生することがある。この不都合を解決す
るため、本発明では、スイッチＳＷ１を音声認識手段１
３側Ａに入れるに先立って、スイッチＳＷ１を騒音検証
手段１４側Ｂに入れ、現在の背景騒音を音声入力手段１
から入力することにより、音声認識がなされるに先立っ
て、騒音の状態を検証することができるようになってい
る。When the above-described speech recognition is performed, the type of background noise changes depending on the place of use, so that erroneous recognition may frequently occur. In order to solve this inconvenience, in the present invention, the switch SW1 is connected to the voice recognition unit 1.
Prior to input to the third side A, the switch SW1 is set to the noise verification means 14 side B, and the current background noise is
, The state of noise can be verified before speech recognition is performed.

【００３１】この場合、騒音検証手段１４では、特徴抽
出手段１２によって抽出された背景騒音の特徴量とｎ個
の検証モデルＱ１，…，Ｑｎとを比較器１６により比較
し、最も類似する検証モデルの番号を選択する。例え
ば、検証モデルがＨＭＭである場合には、騒音の特徴量
に対する尤度が最も大きくなる検証モデルが選択され、
騒音検証手段１４では、このように選択された検証モデ
ルの番号を通信端末中央処理装置１５に送る。これによ
り、通信端末中央処理装置１５は、データ通信機能を使
って検証モデルの番号を通信網２００を介して記憶手段
２に通知する。In this case, the noise verification unit 14 compares the feature amount of the background noise extracted by the feature extraction unit 12 with the n verification models Q1,. Select the number. For example, if the verification model is an HMM, a verification model that maximizes the likelihood of the noise feature is selected,
The noise verification unit 14 sends the number of the verification model selected in this way to the communication terminal central processing unit 15. Thereby, the communication terminal central processing unit 15 notifies the storage unit 2 of the verification model number via the communication network 200 using the data communication function.

【００３２】ここで、騒音検証手段１４において、ｎ個
の検証モデルＱ１，…，Ｑｎのうち、例えば番号２の検
証モデルＱ２が選択され、これが通信端末中央処理装置
１５によって通信網２００を介して記憶手段２に通知さ
れると、記憶手段２においては、通信端末中央処理装置
１５から通知された検証モデルの番号２に対応する認識
モデルパラメータセット（すなわち、いまの場合には、
モデル番号２の認識モデルパラメータセットＰ２）が選
択されて、これが通信網２００を介して通信端末１（す
なわち、通信端末中央処理装置１５）に返送される（ダ
ウンロードされる）。これにより、通信端末中央処理装
置１５は、通信端末１の記憶領域に保存されている現在
の認識モデルパラメータセットＰを、記憶手段２から送
られた（ダウンロードされた）認識モデルパラメータセ
ット（例えばＰ２）に書き換える。Here, the noise verification means 14 selects, for example, the verification model Q2 of number 2 from the n verification models Q1,..., Qn, and this is selected by the communication terminal central processing unit 15 via the communication network 200. When notified to the storage unit 2, the storage unit 2 recognizes the recognition model parameter set corresponding to the verification model number 2 notified from the communication terminal central processing unit 15 (that is, in this case,
The recognition model parameter set P2) of the model number 2 is selected, and this is returned (downloaded) to the communication terminal 1 (that is, the communication terminal central processing unit 15) via the communication network 200. Accordingly, the communication terminal central processing unit 15 transfers the current recognition model parameter set P stored in the storage area of the communication terminal 1 to the recognition model parameter set (for example, P2) transmitted (downloaded) from the storage unit 2. ).

【００３３】ただし、後述のように、通信端末１の記憶
領域に現在保存されている認識モデルパラメータセット
の内容が更新されていた場合は、上記ダウンロードが行
われる前に、通信端末１の記憶領域に現在保存されてい
る認識モデルパラメータセットＰを記憶手段２にアップ
ロードし、しかる後に、該当する認識モデルパラメータ
セットを書き換える。例えば、認識モデルパラメータセ
ットＰ１が通信端末１の記憶領域に現在保存されている
場合には、記憶手段２に記憶されている認識モデルパラ
メータセットＰ１の内容を通信端末１の記憶領域に現在
保存されている認識モデルパラメータセットＰ１で書き
換えた後（アップロードした後）に、記憶手段２からの
認識パラメータセットＰ２を通信端末１の記憶領域にダ
ウンロードする。この処理により、通信端末１側では、
現在の騒音に最も適応した認識モデルパラメータセット
Ｐを用いることができるので、音声認識の精度を向上さ
せることができる。さらに、騒音環境が変化して、再び
過去に用いていた認識モデルパラメータセットが必要な
状況になっても、記憶手段２からその認識モデルパラメ
ータセットを選択することで、これを通信端末１側にダ
ウンロードし直すことが可能となる。However, as will be described later, if the contents of the recognition model parameter set currently stored in the storage area of the communication terminal 1 have been updated, the storage area of the communication terminal 1 must be updated before the download is performed. Then, the recognition model parameter set P currently stored in the storage unit 2 is uploaded to the storage unit 2, and thereafter, the corresponding recognition model parameter set is rewritten. For example, when the recognition model parameter set P1 is currently stored in the storage area of the communication terminal 1, the content of the recognition model parameter set P1 stored in the storage unit 2 is currently stored in the storage area of the communication terminal 1. After rewriting (after uploading) with the existing recognition model parameter set P1, the recognition parameter set P2 from the storage means 2 is downloaded to the storage area of the communication terminal 1. By this processing, the communication terminal 1 side
Since the recognition model parameter set P most suitable for the current noise can be used, the accuracy of speech recognition can be improved. Further, even if the noise environment changes and the recognition model parameter set used in the past becomes necessary again, by selecting the recognition model parameter set from the storage means 2, this is transmitted to the communication terminal 1 side. It will be possible to download it again.

【００３４】このように、第１の実施形態では、通信端
末から入力した音声の特徴量を抽出する特徴抽出手段
と、前記特徴量より音声を認識する音声認識手段と、前
記通信端末から入力した周囲の騒音を検証する騒音検証
手段と、前記通信端末と通信網を介して接続された記憶
手段とを備え、前記音声認識手段は、前記記憶手段に記
憶されている複数の認識モデルパラメータセットの中か
ら前記騒音検証手段を用いて選択された認識モデルパラ
メータセットを用いて、音声認識を行うので、移動用通
信端末などの通信端末からの音声入力時の背景騒音環境
が変化する様々な場面においても、精度よく音声認識を
行うことができる。さらに、この第１の実施形態では、
通信端末１内に記憶手段２を設けずに、通信端末１と通
信網２００を介して記憶手段２を接続しているので、通
信端末１の記憶容量を節約することができる。As described above, in the first embodiment, the feature extracting means for extracting the feature amount of the voice input from the communication terminal, the voice recognizing means for recognizing the voice from the feature amount, and the voice input from the communication terminal. A noise verification unit for verifying ambient noise; and a storage unit connected to the communication terminal via a communication network, wherein the voice recognition unit includes a plurality of recognition model parameter sets stored in the storage unit. Since the voice recognition is performed using the recognition model parameter set selected by using the noise verification unit from the inside, in various scenes where the background noise environment changes at the time of voice input from a communication terminal such as a mobile communication terminal. In addition, speech recognition can be performed with high accuracy. Further, in the first embodiment,
Since the storage unit 2 is connected to the communication terminal 1 via the communication network 200 without providing the storage unit 2 in the communication terminal 1, the storage capacity of the communication terminal 1 can be saved.

【００３５】図３は本発明に係る音声認識システムの第
２の実施形態の構成例を示す図である。なお、図３にお
いて図１と同様の箇所には同じ符号を付している。図３
を参照すると、この音声認識システムは、通信端末（例
えば、携帯電話などの移動用通信端末）２１と、通信端
末２１と通信網３００を介して接続された音声認識手段
２３および記憶手段２とを備えている。FIG. 3 is a diagram showing a configuration example of a second embodiment of the speech recognition system according to the present invention. Note that, in FIG. 3, the same portions as those in FIG. 1 are denoted by the same reference numerals. FIG.
Referring to FIG. 1, the voice recognition system includes a communication terminal (for example, a mobile communication terminal such as a mobile phone) 21 and a voice recognition unit 23 and a storage unit 2 connected to the communication terminal 21 via a communication network 300. Have.

【００３６】ここで、通信端末２１は、音声および／ま
たは周囲の騒音が入力される音声入力手段１１と、音声
入力手段１１から入力された音声の特徴量を抽出する特
徴抽出手段１２と、前記音声入力手段１１から入力され
た周囲の騒音を検証する騒音検証手段１４と、通信端末
中央処理装置２５とを有している。Here, the communication terminal 21 includes a voice input unit 11 for inputting voice and / or ambient noise, a feature extracting unit 12 for extracting a feature amount of the voice input from the voice input unit 11, It has a noise verifying means 14 for verifying ambient noise input from the voice input means 11 and a communication terminal central processing unit 25.

【００３７】なお、音声入力手段１１，特徴抽出手段１
２，騒音検証手段１４，記憶手段２は、図１において説
明したと同様の構成および機能のものとなっている。The voice input means 11 and the feature extraction means 1
2. The noise verification means 14 and the storage means 2 have the same configuration and function as those described in FIG.

【００３８】図３の構成は、図１の通信端末１内に設け
られている音声認識手段１３および認識モデルパラメー
タセットＰを保持するための記憶領域を通信端末２１内
には設けずに、音声認識手段２３として通信網３００に
接続したものである。The configuration of FIG. 3 is different from that of FIG. 1 in that the voice recognition means 13 and the storage area for holding the recognition model parameter set P provided in the communication terminal 1 of FIG. The recognition unit 23 is connected to the communication network 300.

【００３９】この第２の実施形態では、音声認識を行な
う場合は、スイッチＳＷ１をＡの側に入れる。これによ
り、通信端末２１の特徴抽出手段１２により得られた特
徴量は、通信網３００を介して音声認識手段２３に伝送
され、音声認識手段２３では、伝送された特徴量を用い
て音声認識が行われる。この際、音声認識手段２３は、
記憶手段２で選択されている認識モデルパラメータセッ
トを直接参照して音声認識を行うことができる。In the second embodiment, when performing voice recognition, the switch SW1 is turned to the A side. As a result, the feature amount obtained by the feature extraction unit 12 of the communication terminal 21 is transmitted to the speech recognition unit 23 via the communication network 300, and the speech recognition unit 23 performs speech recognition using the transmitted feature amount. Done. At this time, the voice recognition means 23
Speech recognition can be performed by directly referring to the recognition model parameter set selected in the storage means 2.

【００４０】この第２の実施形態においても、騒音環境
が変化した場合は、第１の実施形態と同様に、通信端末
２１側の騒音検証手段１４を用いて、検証モデルの番号
を通信網３００を介して記憶手段２へ通知し、記憶手段
２における認識パラメータセットを選択し直せばよい。Also in the second embodiment, when the noise environment changes, the noise verification unit 14 of the communication terminal 21 is used to change the number of the verification model to the communication network 300 as in the first embodiment. May be notified to the storage unit 2 via the storage unit 2 and the recognition parameter set in the storage unit 2 may be selected again.

【００４１】このように、この第２の実施形態では、特
徴抽出手段と騒音検証手段とを前記通信端末に備え、ま
た、通信端末と通信網を介して音声認識手段を備え、前
記騒音検証手段を用いて前記記憶手段に記憶されている
認識モデルパラメータセットを選択し、選択された認識
モデルパラメータセットを音声認識手段に用いるので、
移動用通信端末などの通信端末からの音声入力時の背景
騒音環境が変化する様々な場面においても、精度よく音
声認識を行うことができる。さらに、この第２の実施形
態では、通信端末２１側に、音声認識手段と認識モデル
パラメータセットを保存するための記憶領域とを持つ必
要がなくなるので、通信端末２１の記憶容量をより一層
節約することができる。As described above, in the second embodiment, the communication terminal is provided with the feature extraction means and the noise verification means, and the voice recognition means is provided through the communication terminal and the communication network. Is used to select a recognition model parameter set stored in the storage means, and the selected recognition model parameter set is used for the speech recognition means,
It is possible to accurately perform speech recognition even in various situations where the background noise environment changes when speech is input from a communication terminal such as a mobile communication terminal. Further, in the second embodiment, it is not necessary to have a voice recognition unit and a storage area for storing a recognition model parameter set on the communication terminal 21 side, so that the storage capacity of the communication terminal 21 is further reduced. be able to.

【００４２】図４は本発明に係る音声認識システムの第
３の実施形態の構成例を示す図である。なお、図４にお
いて図１，図３と同様の箇所には同じ符号を付してい
る。図４を参照すると、この音声認識システムは、通信
端末（例えば、携帯電話などの移動用通信端末）３１
と、通信端末３１と通信網４００を介して接続された音
声認識手段２３，騒音検証手段３４，記憶手段２とを備
えている。FIG. 4 is a diagram showing a configuration example of a third embodiment of the speech recognition system according to the present invention. In FIG. 4, the same parts as those in FIGS. 1 and 3 are denoted by the same reference numerals. Referring to FIG. 4, the voice recognition system includes a communication terminal (for example, a mobile communication terminal such as a mobile phone) 31
And a voice recognition unit 23, a noise verification unit 34, and a storage unit 2 connected to the communication terminal 31 via the communication network 400.

【００４３】ここで、通信端末３１は、音声および／ま
たは周囲の騒音が入力される音声入力手段１１と、音声
入力手段１１から入力された音声の特徴量を抽出する特
徴抽出手段１２と、通信端末中央処理装置３５とを有し
ている。Here, the communication terminal 31 includes a voice input unit 11 to which voice and / or ambient noise is input, a feature extraction unit 12 to extract a feature amount of the voice input from the voice input unit 11, and a communication terminal. And a terminal central processing unit 35.

【００４４】なお、音声入力手段１１，特徴抽出手段１
２，音声認識手段２３，記憶手段２は、図１，図３にお
いて説明したと同様の構成および機能のものとなってい
る。The voice input means 11 and the feature extraction means 1
2. The voice recognition means 23 and the storage means 2 have the same configuration and function as those described with reference to FIGS.

【００４５】図４の構成は、図３の通信端末２１内に設
けられている騒音検証手段１４を通信端末３１内には設
けずに、騒音検証手段３４として通信網４００に接続し
たものである。The configuration of FIG. 4 is such that the noise verification means 14 provided in the communication terminal 21 of FIG. 3 is not provided in the communication terminal 31 but is connected to the communication network 400 as the noise verification means 34. .

【００４６】この第３の実施形態では、音声認識を行う
場合には、スイッチＳＷ１を音声認識手段２３側Ａへ入
れる。一方、認識モデルパラメータセットを変更する場
合には、認識モデルパラメータセットの番号を取得する
ため、スイッチＳＷ１を騒音検証手段３４側Ｂへ入れて
通信端末３１側から伝送されてきた騒音の特徴量を検証
することができる。In the third embodiment, when performing voice recognition, the switch SW1 is turned on the voice recognition means 23 side A. On the other hand, when the recognition model parameter set is changed, the switch SW1 is inserted into the noise verification unit 34B to obtain the number of the recognition model parameter set, and the noise feature amount transmitted from the communication terminal 31 is changed. Can be verified.

【００４７】このように、第３の実施形態では、特徴抽
出手段を前記通信端末に備え、また、通信端末と通信網
を介して騒音検証手段，音声認識手段を備え、前記騒音
検証手段を用いて前記記憶手段に記憶されている認識モ
デルパラメータセットを選択し、選択された認識モデル
パラメータセットを音声認識手段に用いるので、移動用
通信端末などの通信端末からの音声入力時の背景騒音環
境が変化する様々な場面においても、精度よく音声認識
を行うことができる。さらに、この第３の実施形態で
は、通信端末３１側に、音声認識手段と認識モデルパラ
メータセットを保存するための記憶領域と騒音検証手段
とを持つ必要がなくなるので、通信端末３１の記憶容量
を図３の場合よりもさらに一層節約することができる。As described above, in the third embodiment, the characteristic extracting means is provided in the communication terminal, and the noise verifying means and the voice recognizing means are provided through the communication terminal and the communication network. Thus, the recognition model parameter set stored in the storage means is selected, and the selected recognition model parameter set is used for the speech recognition means, so that the background noise environment at the time of voice input from a communication terminal such as a mobile communication terminal is reduced. Speech recognition can be performed with high accuracy even in various changing scenes. Further, in the third embodiment, the communication terminal 31 does not need to have a storage area for storing the recognition model parameter set and a noise verification unit on the communication terminal 31 side, so that the storage capacity of the communication terminal 31 is reduced. It is possible to save even more than in the case of FIG.

【００４８】図５は本発明に係る音声認識システムの第
４の実施形態の構成例を示す図である。なお、図５にお
いて図１，図３，図４と同様の箇所には同じ符号を付し
ている。図５を参照すると、この音声認識システムは、
通信端末（例えば、携帯電話などの移動用通信端末）４
１と、通信端末４１と通信網５００を介して接続された
特徴抽出手段４２，音声認識手段２３，騒音検証手段３
４，記憶手段２とを備えている。FIG. 5 is a diagram showing a configuration example of a fourth embodiment of the speech recognition system according to the present invention. In FIG. 5, the same parts as those in FIGS. 1, 3, and 4 are denoted by the same reference numerals. Referring to FIG. 5, the speech recognition system includes:
Communication terminal (for example, a mobile communication terminal such as a mobile phone) 4
1, a feature extraction unit 42, a voice recognition unit 23, and a noise verification unit 3 connected to the communication terminal 41 via the communication network 500.
4, storage means 2.

【００４９】ここで、通信端末４は、音声および／また
は周囲の騒音が入力される音声入力手段１１と、通信端
末中央処理装置４５とを有している。Here, the communication terminal 4 has a voice input means 11 for inputting voice and / or ambient noise, and a communication terminal central processing unit 45.

【００５０】なお、音声入力手段１１，音声認識手段２
３，騒音検証手段３４，記憶手段２は、図１，図３，図
４において説明したと同様の構成および機能のものとな
っている。The voice input means 11 and the voice recognition means 2
3, the noise verification means 34 and the storage means 2 have the same configuration and function as those described in FIGS.

【００５１】この第４の実施形態では、本発明を実施す
るための手段を全て通信網５００側に設けている。In the fourth embodiment, all means for implementing the present invention are provided on the communication network 500 side.

【００５２】この第４の実施形態では、通信網５００を
介して通信端末４１から伝送された音声を通信網５００
に接続されている特徴抽出手段４２を用いて、特徴量を
抽出する。この場合、通信網５００に伝送される対象
は、特徴量などのデータではなく音声であるため、通信
網５００としては、広く一般に普及している音声用の公
衆回線網を利用することが可能である。In the fourth embodiment, the voice transmitted from the communication terminal 41 via the communication network
The feature amount is extracted by using the feature extraction means 42 connected to. In this case, the object to be transmitted to the communication network 500 is not data such as feature data but voice, and therefore, a widely used public telephone network for voice can be used as the communication network 500. is there.

【００５３】このように、第４の実施形態では、通信端
末と通信網を介して特徴抽出手段，騒音検証手段，音声
認識手段を備え、前記騒音検証手段を用いて前記記憶手
段に記憶されている認識モデルパラメータセットを選択
し、選択された認識モデルパラメータセットを音声認識
手段に用いるので、移動用通信端末などの通信端末から
の音声入力時の背景騒音環境が変化する様々な場面にお
いても、精度よく音声認識を行うことができる。さら
に、この第４の実施形態では、通信端末３１側に、音声
認識手段と認識モデルパラメータセットを保存するため
の記憶領域と騒音検証手段とを持つ必要がなくなるの
で、通信端末３１の記憶容量を図３の場合よりもさらに
一層節約することができる。As described above, in the fourth embodiment, the communication terminal is provided with the feature extracting unit, the noise verifying unit, and the voice recognizing unit via the communication network, and is stored in the storage unit using the noise verifying unit. Since the selected recognition model parameter set is selected and the selected recognition model parameter set is used for voice recognition means, even in various scenes where the background noise environment at the time of voice input from a communication terminal such as a mobile communication terminal changes, Speech recognition can be performed with high accuracy. Further, in the fourth embodiment, the communication terminal 31 does not need to have a voice recognition unit, a storage area for storing a recognition model parameter set, and a noise verification unit, so that the storage capacity of the communication terminal 31 is reduced. It is possible to save even more than in the case of FIG.

【００５４】なお、上述の各実施形態において、騒音検
証手段１４，３４に設けられている検証モデルと音声認
識手段１３，２３または記憶手段２の認識モデルパラメ
ータを入力音声特徴量を用いて更新することも可能であ
る。すなわち、特徴抽出手段１２，４２により得られた
特徴量を用いて、認識モデルパラメータや騒音の検証モ
デルを更新することができる。In each of the above-described embodiments, the verification models provided in the noise verification units 14 and 34 and the recognition model parameters of the voice recognition units 13 and 23 or the storage unit 2 are updated using the input voice feature amounts. It is also possible. That is, the recognition model parameters and the noise verification model can be updated using the feature amounts obtained by the feature extraction units 12 and 42.

【００５５】具体的に、認識モデルパラメータを更新す
る場合は、音声の特徴量と正解の認識モデルパラメータ
とを音声認識手段１２，２３により照合する。この時の
照合経路より、認識モデルパラメータと特徴量とを対応
付けできるので、次式によって認識モデルパラメータの
更新処理を行うことができる。More specifically, when updating the recognition model parameters, the speech recognition means 12 and 23 collate the feature amount of the speech with the correct recognition model parameter. Since the recognition model parameter and the feature amount can be associated with each other from the collation path at this time, the recognition model parameter can be updated by the following equation.

【００５６】[0056]

【数１】ｕ’_ni＝（１−ａ）ｕ_ni＋ａ・Ｘ_mi（０≦ａ≦１）## EQU1 ## _u'ni = (1-a) _uni + _a.Xmi (0≤a≤1)

【００５７】ここで、ｕ_niは変更前のパラメータ値であ
り、ｕ’_niは更新後のパラメータ値である。ただし、ｎ
はパラメータ番号、ｉは要素番号である。また、Ｘ
_miは、ｍフレーム目の音声特徴量を表している。また、
ａは、特徴量をどの程度パラメータに反映するかを決め
る適応係数である。適応係数ａは、例えば、ａ＝１０^-3
のように設定される。Here, u _ni is the parameter value before change, and u ′ _ni is the parameter value after update. Where n
Is a parameter number, and i is an element number. Also, X
_mi represents the audio feature value of the m-th frame. Also,
“a” is an adaptation coefficient for determining how much the feature value is reflected on the parameter. The adaptation coefficient a is, for example, a = 10 ^−3.
Is set as follows.

【００５８】また、騒音の検証モデルを更新する場合に
ついても、上述した認識モデルパラメータの更新処理と
同様に処理を行えばよい。Also, when updating the noise verification model, the same processing as the above-described recognition model parameter update processing may be performed.

【００５９】なお、認識モデルパラメータ，検証モデル
の更新処理の仕方は、数１に限られるものではなく、Ｍ
ＡＰ（最大事後確率）推定法などの良く知られた適応手
法を用いることもできる。Note that the method of updating the recognition model parameters and the verification model is not limited to the equation (1).
A well-known adaptation method such as an AP (maximum posterior probability) estimation method can also be used.

【００６０】このように、検証モデルと認識モデルパラ
メータを入力音声特徴量を用いて更新することによっ
て、背景騒音の影響だけでなく、話者の特性や通信時の
雑音を吸収することができ、認識精度を向上させること
ができる。As described above, by updating the verification model and the recognition model parameters using the input speech feature amount, not only the influence of the background noise but also the characteristics of the speaker and the noise at the time of communication can be absorbed. The recognition accuracy can be improved.

【００６１】また、上述の各実施形態において、騒音検
証手段１４，３４の検証モデルと記憶手段２の認識モデ
ルパラメータセットとを追加し、上述した方法により
（例えば数１により）、すなわち、入力音声特徴量を用
いて、追加された検証モデルと追加された認識モデルパ
ラメータセットとを更新することもできる。In each of the above-described embodiments, the verification models of the noise verification units 14 and 34 and the recognition model parameter set of the storage unit 2 are added, and the input method is performed by the above-described method (for example, according to Expression 1). The added verification model and the added recognition model parameter set can be updated using the feature amount.

【００６２】図６は騒音検証手段１４，３４の検証モデ
ルと記憶手段２の認識モデルパラメータセットとを追加
し、追加された検証モデルと追加された認識モデルパラ
メータセットとを更新する処理を説明するための図であ
る。FIG. 6 illustrates a process of adding the verification models of the noise verification means 14 and 34 and the recognition model parameter set of the storage means 2 and updating the added verification model and the added recognition model parameter set. FIG.

【００６３】騒音検証手段１４，３４の検証モデルと記
憶手段２の認識モデルパラメータセットとを追加し、追
加された検証モデルと追加された認識モデルパラメータ
セットとを更新する処理は、具体的には、次のようにし
てなされる。すなわち、認識モデルパラメータセットに
ついては、まず、記憶手段２の認識モデルパラメータセ
ット（例えばＰ１）をコピーし、新しい認識モデルパラ
メータセット（図６では番号ｎ＋１のパラメータセット
Ｐ（ｎ＋１））を作成する。ここで、認識モデルパラメ
ータセットＰ１だけは、静かな環境で訓練された特別な
認識モデルパラメータセットであり、どの騒音環境の影
響も受けていないとする。その後、認識モデルパラメー
タセットＰ（ｎ＋１）に対し、上述した認識モデルパラ
メータの更新処理を行うことにより、他の騒音環境の影
響を受けずに、現在使用中の騒音環境に適応した認識モ
デルパラメータセットを獲得することができる。The process of adding the verification models of the noise verification means 14 and 34 and the recognition model parameter set of the storage means 2 and updating the added verification model and the added recognition model parameter set is specifically described as follows. Is performed as follows. That is, as for the recognition model parameter set, first, the recognition model parameter set (for example, P1) in the storage unit 2 is copied, and a new recognition model parameter set (the parameter set P (n + 1) with the number n + 1 in FIG. 6) is created. Here, it is assumed that only the recognition model parameter set P1 is a special recognition model parameter set trained in a quiet environment and is not affected by any noise environment. Thereafter, by performing the above-described process of updating the recognition model parameters on the recognition model parameter set P (n + 1), the recognition model parameter set adapted to the currently used noise environment without being affected by other noise environments. Can be acquired.

【００６４】また、検証モデルについては、騒音検証手
段１４，３４では、（ｎ＋１）番目の検証モデルＱ（ｎ
＋１）用に記憶領域を新たに確保する。その後、上述し
た検証モデルの更新処理を行なうことにより（例えば、
数１の適応係数ａをａ＝１として更新処理を行なうこと
により）、（ｎ＋１）番目の検証モデルＱ（ｎ＋１）を
作成する。この処理により、他の騒音環境の成分を含ま
ず、現環境の騒音状態を検証するための検証モデルを作
成できる。As for the verification model, the noise verification means 14 and 34 use the (n + 1) th verification model Q (n
A new storage area is reserved for +1). Then, by performing the above-described verification model update processing (for example,
(The update process is performed by setting the adaptive coefficient a in Equation 1 to a = 1) to create the (n + 1) th verification model Q (n + 1). By this processing, a verification model for verifying the noise state of the current environment without including other components of the noise environment can be created.

【００６５】このように、騒音環境を検証する検証モデ
ルと認識モデルパラメータとを現在の使用環境に特化し
て作成することにより、認識精度を飛躍的に向上させる
ことが可能となる。As described above, by creating the verification model for verifying the noise environment and the recognition model parameters specifically for the current use environment, the recognition accuracy can be drastically improved.

【００６６】すなわち、騒音検証手段１４，３４の検証
モデルと記憶手段２の認識モデルパラメータセットとを
追加し、追加された検証モデルと追加された認識モデル
パラメータセットとを更新する処理は、騒音検証手段を
用いて認識モデルパラメータセットを選択したり、認識
モデルパラメータや検証モデルの更新処理を行っても、
満足な性能が得られないときに効果がある。That is, the processing of adding the verification models of the noise verification means 14 and 34 and the recognition model parameter set of the storage means 2 and updating the added verification model and the added recognition model parameter set is performed by noise verification. Even if the recognition model parameter set is selected using the means, or the recognition model parameters and the verification model are updated,
This is effective when satisfactory performance cannot be obtained.

【００６７】ところで、図１，図３，図４あるいは図５
の通信端末１，２１，３１，４１は、ＤＳＰ（ディジタ
ル信号処理プロセッサ）などの専用のハードウエアで実
現する以外に，例えばワークステーション，パーソナル
コンピュータなどに用いられている汎用のハードウエア
で実現することも可能である。図７は本発明の通信端末
のハードウェア構成例を示す図である。図７を参照する
と、通信端末のハードウェアとして、全体を制御するＣ
ＰＵ５１と、ＣＰＵ５１の制御プログラム，読み出し専
用データなどが記憶されているＲＯＭ５２と、ＣＰＵ５
１の作業領域などに使用されるＲＡＭ５３と、データ記
憶領域として使用されるハードディスク５４と、音声入
力部５５と、通信インタフェース５６とが設けられてい
る。FIG. 1, FIG. 3, FIG. 4 or FIG.
The communication terminals 1, 21, 31, and 41 are realized by dedicated hardware such as a DSP (Digital Signal Processor) or general-purpose hardware used in, for example, a workstation or a personal computer. It is also possible. FIG. 7 is a diagram illustrating a hardware configuration example of a communication terminal according to the present invention. With reference to FIG. 7, as hardware of the communication terminal, C
A PU 51, a ROM 52 storing a control program for the CPU 51, read-only data, and the like;
1, a RAM 53 used as a work area, a hard disk 54 used as a data storage area, a voice input unit 55, and a communication interface 56 are provided.

【００６８】また、本発明の音声認識機能，とりわけ図
１，図３，図４，図５の通信端末１，２１，３１，４１
の機能は、例えばソフトウェアパッケージ（ＣＤ−ＲＯ
Ｍなどの情報記録媒体）の形態で提供することができ
る。すなわち、本発明は、汎用ＯＳが稼動する計算機上
の記憶装置（例えば図７のようなハードウェア構成の通
信端末のＲＡＭ５３やハードディスク５４等）にＣＤ−
ＲＯＭなどの記録媒体に記録されたプログラムを読込ま
せて、計算機のもつハードウェア構成で、所定の処理を
実行させることで実現できる。なお、記録媒体として
は、ＣＤ−ＲＯＭに限られるものではなく、ＲＯＭ、Ｒ
ＡＭ、フレキシブルディスク、メモリカードなどが用い
られてもよいし，通信網を介したダウンロードの形態で
もよい。また、記録媒体に記録されたプログラムは、ハ
ードウェアシステムに組込まれている記憶装置、例えば
ハードディスクにインストールすることにより、このプ
ログラムを実行して、本発明の音声認識などの機能を実
現することができる。The speech recognition function of the present invention, in particular, the communication terminals 1, 21, 31, 41 shown in FIGS.
Function is, for example, a software package (CD-RO
M or other information recording media). That is, according to the present invention, a CD-ROM is stored in a storage device on a computer on which a general-purpose OS operates (for example, a RAM 53 or a hard disk 54 of a communication terminal having a hardware configuration as shown in FIG. 7).
This can be realized by reading a program recorded in a recording medium such as a ROM and executing predetermined processing with a hardware configuration of the computer. The recording medium is not limited to a CD-ROM, but may be a ROM,
An AM, a flexible disk, a memory card, or the like may be used, or a form of download via a communication network may be used. The program recorded on the recording medium can be installed in a storage device incorporated in the hardware system, for example, a hard disk to execute the program and realize functions such as voice recognition of the present invention. it can.

【００６９】[0069]

【発明の効果】以上に説明したように、請求項１乃至請
求項１４記載の発明によれば、周囲の騒音を検証し、騒
音環境に最も適合する認識モデルパラメータを用いて、
音声認識を行なうようになっているので、使用環境によ
らずに精度良く音声認識を行うことができる。また、通
信網を介した記憶手段を設けることにより、通信端末の
記憶容量を節約できる。As described above, according to the first to fourteenth aspects of the present invention, the surrounding noise is verified, and the recognition model parameters most suitable for the noise environment are used.
Since the voice recognition is performed, the voice recognition can be performed accurately regardless of the use environment. Further, by providing the storage means via the communication network, the storage capacity of the communication terminal can be saved.

【００７０】特に、請求項２乃至請求項４，請求項９乃
至請求項１１記載の発明によれば、手段ごとの処理を通
信網側に分散することにより、通信端末の記憶容量の大
きさに応じた実現方法を提供し、音声認識の精度を維持
できる。すなわち、通信端末の記憶容量が小さい場合で
も、通信網を介して分散処理を行うことにより、音声認
識の精度を維持することができる。In particular, according to the second to fourth and fourth to ninth aspects of the present invention, the processing of each means is distributed to the communication network side to reduce the storage capacity of the communication terminal. By providing a corresponding implementation method, the accuracy of speech recognition can be maintained. That is, even when the storage capacity of the communication terminal is small, the accuracy of voice recognition can be maintained by performing distributed processing via the communication network.

【００７１】また、請求項１２記載の発明によれば、騒
音の検証モデル，認識モデルパラメータセットを使用環
境に応じて更新することにより、背景騒音の影響の他
に、話者の特性や通信時の雑音の影響を吸収し、認識精
度を向上することができる。すなわち、話者の特性や、
背景騒音の種類、通信経路上の雑音による音声の変動を
吸収し、音声認識精度を向上させることができる。According to the twelfth aspect of the present invention, the noise verification model and the recognition model parameter set are updated in accordance with the use environment, so that in addition to the influence of the background noise, the characteristics of the speaker and the communication time can be improved. , The effect of the noise can be absorbed, and the recognition accuracy can be improved. That is, the characteristics of the speaker,
Variations in voice due to the type of background noise and noise on the communication path can be absorbed, and the accuracy of voice recognition can be improved.

【００７２】また、請求項１３記載の発明によれば、騒
音の検証モデル，認識モデルパラメータセットを新たに
追加することにより、さらに、認識精度を向上すること
ができる。すなわち、話者の特性や、背景騒音の種類、
通信経路上の雑音による音声の変動を吸収し、音声認識
精度を向上させることができる。According to the thirteenth aspect of the present invention, the recognition accuracy can be further improved by newly adding a noise verification model and a recognition model parameter set. That is, the characteristics of the speaker, the type of background noise,
Fluctuations of voice due to noise on the communication path can be absorbed, and the accuracy of voice recognition can be improved.

[Brief description of the drawings]

【図１】本発明に係る音声認識システムの第１の実施形
態の構成例を示す図である。FIG. 1 is a diagram showing a configuration example of a first embodiment of a speech recognition system according to the present invention.

【図２】騒音検証手段の構成例を示す図である。FIG. 2 is a diagram illustrating a configuration example of a noise verification unit.

【図３】本発明に係る音声認識システムの第２の実施形
態の構成例を示す図である。FIG. 3 is a diagram showing a configuration example of a second embodiment of the speech recognition system according to the present invention.

【図４】本発明に係る音声認識システムの第３の実施形
態の構成例を示す図である。FIG. 4 is a diagram showing a configuration example of a third embodiment of the speech recognition system according to the present invention.

【図５】本発明に係る音声認識システムの第４の実施形
態の構成例を示す図である。FIG. 5 is a diagram showing a configuration example of a fourth embodiment of the speech recognition system according to the present invention.

【図６】騒音検証手段の検証モデルと記憶手段の認識パ
ラメータセットとを追加し、追加された検証モデルと追
加された認識パラメータセットとを更新する処理を説明
するための図である。FIG. 6 is a diagram for explaining a process of adding a verification model of a noise verification unit and a recognition parameter set of a storage unit, and updating the added verification model and the added recognition parameter set.

【図７】本発明の通信端末のハードウェア構成例を示す
図である。FIG. 7 is a diagram illustrating a hardware configuration example of a communication terminal according to the present invention.

[Explanation of symbols]

１，２１，３１，４１通信端末２記憶手段１１音声入力手段１２，４２特徴抽出手段１３，２３音声認識手段１４，３４騒音検証手段１５通信端末中央処理装置１６比較器５１ＣＰＵ５２ＲＯＭ５３ＲＡＭ５４ハードディスク５５音声入力部５６通信インタフェース２００，３００，４００，５００通信網 1, 21, 31, 41 Communication terminal 2 Storage means 11 Voice input means 12, 42 Feature extraction means 13, 23 Voice recognition means 14, 34 Noise verification means 15 Communication terminal central processing unit 16 Comparator 51 CPU 52 ROM 53 RAM 54 Hard disk 55 Voice input unit 56 Communication interface 200, 300, 400, 500 Communication network

Claims

[Claims]

1. A communication terminal comprising: a communication terminal; and storage means connected to the communication terminal via a communication network, wherein the communication terminal includes: a voice input unit to which voice and / or ambient noise is input; A feature extraction unit that extracts a feature amount of the voice input from the unit, a voice recognition unit that recognizes the voice by comparing the feature amount of the voice with a predetermined recognition model parameter set, and a voice input unit that is input from the voice input unit. Noise verification means for verifying the surrounding noise, wherein the noise verification means, when verifying the surrounding noise, stores a recognition model parameter set corresponding to the type of the verified noise in the storage means. A speech recognition unit selects from a plurality of recognition model parameter sets, and performs speech recognition using the recognition model parameter set selected by the noise verification unit. A speech recognition system characterized by the following.

2. A communication terminal comprising: a communication terminal; voice recognition means connected to the communication terminal via a communication network; and storage means connected to the communication terminal via a communication network. And / or voice input means to which ambient noise is input, feature extraction means for extracting a feature amount of voice input from the voice input means, and noise verification for verifying surrounding noise input from the voice input means Means, the noise verification means, when verifying the surrounding noise, a recognition model parameter set corresponding to the type of the verified noise from a plurality of recognition model parameter sets stored in the storage means And selecting the feature amount of the transmitted voice by the noise verification unit when the feature amount of the voice extracted by the feature extraction unit of the communication terminal is transmitted from the communication terminal. A speech recognition system for performing speech recognition by collating with a selected recognition model parameter set.

3. A communication terminal, voice recognition means connected to the communication terminal via a communication network, noise verification means connected to the communication terminal via a communication network, and a communication terminal and the communication network. And a storage unit connected via the voice input unit, wherein the communication terminal includes a voice input unit to which voice and / or ambient noise is input, and a feature extraction unit to extract a feature amount of the voice input from the voice input unit. Wherein the noise verifying means, when verifying ambient noise input from the voice input means, stores a plurality of recognition model parameter sets corresponding to the verified type of noise in a storage means. The voice recognition means selects from the parameter set, and when the feature quantity of the voice extracted by the feature extraction means of the communication terminal is sent from the communication terminal, the feature quantity of the sent voice is determined by the noise verification method. A speech recognition system characterized by performing speech recognition by collating with a recognition model parameter set selected by a stage.

4. A communication terminal, a voice recognition unit connected to the communication terminal via a communication network, a noise verification unit connected to the communication terminal via a communication network, and a communication terminal and the communication network. And a storage unit connected to the communication terminal via a communication network, wherein the communication terminal has a voice input unit to which voice and / or ambient noise is input. The feature extraction means extracts a feature amount of the input voice when the voice input by the voice input means of the communication terminal is transmitted from the communication terminal; and the noise verification means includes a voice input means of the communication terminal. When the surrounding noise input in step S is transmitted from the communication terminal, the surrounding noise is verified, and the recognition parameter set corresponding to the type of the verified noise is selected from the plurality of recognition parameter sets stored in the storage unit. Selected, The voice recognition unit performs voice recognition by comparing a feature amount of the voice extracted by the feature extraction unit with a recognition model parameter set selected by the noise verification unit. Recognition system.

5. The voice recognition system according to claim 1, wherein the communication terminal is a mobile communication terminal.

6. A feature extracting unit for extracting a feature amount of a voice input from a communication terminal, a voice recognizing unit for recognizing a voice from the feature amount, and a noise verifying unit for verifying ambient noise input from the communication terminal. And a storage unit connected to the communication terminal via a communication network, wherein the voice recognition unit uses the noise verification unit from among a plurality of recognition model parameter sets stored in the storage unit. A speech recognition method comprising performing speech recognition using a selected recognition model parameter set.

7. The speech recognition method according to claim 6, wherein
A speech recognition method, wherein a recognition model parameter set stored in the speech recognition unit is transferred to the storage unit via a communication network.

8. The speech recognition method according to claim 6, wherein said communication terminal includes a feature extraction unit, a speech recognition unit, and a noise verification unit, and is stored in said storage unit using said noise verification unit. A selected recognition model parameter set, fetching the selected recognition model parameter set to the communication terminal via a communication network, and storing the set in a voice recognition unit.

9. The speech recognition method according to claim 6, wherein
Providing the communication terminal with a feature extraction unit and a noise verification unit,
Also, a voice recognition unit is provided via a communication terminal and a communication network,
A speech recognition method comprising: selecting a recognition model parameter set stored in said storage means using said noise verification means; and using said selected recognition model parameter set for speech recognition means.

10. The voice recognition method according to claim 6, further comprising: a feature extracting unit provided in the communication terminal; and a noise verification unit and a voice recognition unit via a communication network with the communication terminal. A recognition model parameter set stored in the storage means, and using the selected recognition model parameter set for the speech recognition means.

11. The voice recognition method according to claim 6, further comprising a feature extracting unit, a noise verifying unit, and a voice recognizing unit via a communication terminal and a communication network, wherein the feature extracting unit, the noise verifying unit, and the voice recognizing unit are stored in the storage unit using the noise verifying unit. A speech recognition method characterized by selecting a set of recognition model parameters that is being used and using the selected set of recognition model parameters for speech recognition means.

12. The voice recognition method according to claim 6, wherein a verification model of the noise verification unit and a recognition model parameter of the voice recognition unit and / or the storage unit are used as input voice feature amounts. A speech recognition method characterized by using and updating.

13. The speech recognition method according to claim 6, wherein a verification model of a noise verification unit and a recognition model parameter set of a storage unit can be added. A speech recognition method characterized by updating a verification model and an added recognition model parameter set using an input speech feature amount.

14. When the surrounding noise is verified, a recognition model parameter set corresponding to the type of the verified noise is selected from a plurality of recognition model parameter sets stored in the storage means, and the selected recognition is performed. A computer-readable recording medium in which a program for causing a computer to execute a process of performing speech recognition using a model parameter set is recorded.