JPH11352985A

JPH11352985A - Voice recognition device

Info

Publication number: JPH11352985A
Application number: JP10158895A
Authority: JP
Inventors: Kenichi Yamamoto; 健一山本; Satoru Oishi; 哲大石; Takahide Takahashi; 隆英高橋
Original assignee: Toshiba TEC Corp
Current assignee: Toshiba TEC Corp
Priority date: 1998-06-08
Filing date: 1998-06-08
Publication date: 1999-12-24

Abstract

PROBLEM TO BE SOLVED: To prevent an unnecessary language element code outputted by voice recognition from being transferred to an application program. SOLUTION: This voice recognition device comprise a voice input 11 for inputting the voice of a speaker; a voice recognition source 12 for storing a plurality of words and phrases to be preliminarily recognized and a language element code corresponding to each word and phase, a voice recognition part 13 for recognizing the words and phrases from the voice inputted from the voice input 11 and extracting and outputting the language element code corresponding to each word and phrase from the recognition source 12 when the recognized words and phrases include the word and phrase to be preliminarily recognized, and a filter part 14 for removing all the language element codes when a necessary language element code is not included in a plurality of language element codes outputted from the voice recognition part 13 by one voice recognition.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、入力した音声によ
り語句を認識してコード化する音声認識装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech recognition apparatus for recognizing and coding words and phrases from input speech.

【０００２】[0002]

【従来の技術】従来の音声認識装置は、図１２に示すよ
うに音声を入力するマイク１ａとこのマイクからの音声
をデジタル信号に変換するＡ／Ｄ変換器１ｂを備える音
声入力部１、予め認識されるべき語句に対して定義され
た言語要素コードの集合体である音声認識リソース２、
この音声入力部１からの出力に基づいて語句を認識し、
その語句に対応する言語要素コードを音声認識リソース
２に基づいて抽出する音声認識部３、音声認識部３で抽
出した言語要素コードのコード列を音声認識データとし
て利用するアプリケーションプログラム４から構成され
る。2. Description of the Related Art As shown in FIG. 12, a conventional voice recognition apparatus has a voice input unit 1 having a microphone 1a for inputting voice and an A / D converter 1b for converting voice from the microphone into a digital signal. A speech recognition resource 2, which is a collection of language element codes defined for the phrase to be recognized;
Recognize words and phrases based on the output from the voice input unit 1,
The speech recognition unit 3 extracts a language element code corresponding to the phrase based on the speech recognition resource 2 and an application program 4 that uses a code string of the language element code extracted by the speech recognition unit 3 as speech recognition data. .

【０００３】このような装置では、話者がマイク１ａに
向って発声すると、その音声は音声入力部１でデジタル
信号に変換されて、音声認識部３に供給される。音声認
識部３では、発声した順に予め定義されている語句が認
識され、その順に言語要素コードのコード列がアプリケ
ーションプログラム４へ音声認識データとして渡され
る。In such a device, when a speaker speaks toward the microphone 1a, the speech is converted into a digital signal by the speech input unit 1 and supplied to the speech recognition unit 3. The speech recognition unit 3 recognizes words and phrases defined in advance in the order in which they were uttered, and passes a code string of language element codes to the application program 4 as speech recognition data in that order.

【０００４】[0004]

【発明が解決しようとする課題】しかし、このような音
声認識装置を例えば客と店員との会話から音声認識を行
う場合、アプリケーションプログラム４では不要な語
句、例えば挨拶や世間話、雑音などについてもマイク１
ａから音声として取入れるので、必要な語句として誤っ
て認識してしまうおそれがあり、しかもその結果をアプ
リケーションプログラム４へ渡してしまうという問題が
考えられる。このことは、アプリケーションプログラム
４の誤作動の原因ともなる。However, when such a voice recognition device performs voice recognition from, for example, a conversation between a customer and a clerk, the application program 4 does not recognize unnecessary words, such as greetings, small talk, and noise. Microphone 1
Since it is taken as a voice from "a", there is a possibility that it may be erroneously recognized as a necessary phrase, and the result may be passed to the application program 4. This causes a malfunction of the application program 4.

【０００５】このため、上述したような音声認識装置
を、話者の会話の中から必要な語句だけを音声認識する
ような場合にそのまま使用することはできない。[0005] For this reason, the above-described speech recognition apparatus cannot be used as it is in a case where only necessary words and phrases are recognized in a conversation between speakers.

【０００６】そこで、本発明は、音声認識により出力さ
れた不要な言語要素コードをアプリケーションプログラ
ムに渡すことを防止できる音声認識装置を提供しようと
するものである。Accordingly, an object of the present invention is to provide a speech recognition apparatus which can prevent an unnecessary language element code output by speech recognition from being passed to an application program.

【０００７】[0007]

【課題を解決するための手段】請求項１の本発明は、話
者の音声を入力するための音声入力手段と、この音声入
力手段から入力した音声から語句を認識する音声認識手
段と、予め認識されるべき複数の語句と各語句に対応し
た言語要素コードを記憶する言語要素コード記憶手段
と、音声認識手段で認識された語句が予め認識されるべ
き語句を含むとき、各語句に対応する言語要素コードを
言語要素コード記憶手段から抽出して出力する言語要素
コード出力手段と、この言語要素コード出力手段から出
力された言語要素コードのうち不要なものを除去し、必
要なものを通過させるフィルタ手段とを備えたことを特
徴とする音声認識装置である。According to a first aspect of the present invention, there is provided a voice input means for inputting a voice of a speaker, a voice recognition means for recognizing a phrase from a voice input from the voice input means, and A language element code storage unit for storing a plurality of words to be recognized and a language element code corresponding to each word, and a word corresponding to each word when the word recognized by the voice recognition unit includes a word to be recognized in advance. A language element code output means for extracting and outputting a language element code from the language element code storage means, and removing unnecessary language element codes from the language element code output from the language element code output means and passing necessary ones A speech recognition device comprising a filter unit.

【０００８】請求項２の本発明は、話者の音声を入力す
るための音声入力手段と、この音声入力手段から入力し
た音声から語句を認識する音声認識手段と、予め認識さ
れるべき複数の語句と各語句に対応した言語要素コード
を記憶する言語要素コード記憶手段と、音声認識手段で
認識された語句が予め認識されるべき語句を含むとき、
各語句に対応する言語要素コードを言語要素コード記憶
手段から抽出して出力する言語要素コード出力手段と、
１回の音声認識処理で言語要素コード出力手段から出力
された複数の言語要素コードのうち、１つでも必要な言
語要素コードが含まれていない場合はすべての言語要素
コードを除去し、必要な言語要素コードがすべてそろっ
ている場合はそれらの言語要素コードを通過させるフィ
ルタ手段とを備えたことを特徴とする音声認識装置であ
る。According to a second aspect of the present invention, there is provided a voice input means for inputting a voice of a speaker, a voice recognition means for recognizing a phrase from the voice input from the voice input means, and a plurality of voice recognition means to be recognized in advance. A language element code storage unit that stores a word and a language element code corresponding to each phrase, and when the phrase recognized by the speech recognition unit includes a phrase to be recognized in advance,
Language element code output means for extracting and outputting a language element code corresponding to each phrase from the language element code storage means,
If one of the plurality of language element codes output from the language element code output means in one speech recognition process does not include any necessary language element code, all language element codes are removed, and When all the language element codes are available, the speech recognition apparatus is provided with filter means for passing the language element codes.

【０００９】請求項３の本発明は、話者の音声を入力す
るための音声入力手段と、この音声入力手段から入力し
た音声から語句を認識する音声認識手段と、予め認識さ
れるべき複数の語句と各語句に対応した所定の数値範囲
内の言語要素コードを記憶する言語要素コード記憶手段
と、音声認識手段で認識された語句が予め認識されるべ
き語句を含むとき、各語句に対応する言語要素コードを
言語要素コード記憶手段から抽出して出力する言語要素
コード出力手段と、言語要素コード出力手段から出力さ
れた言語要素コードが、所定の数値範囲外のものである
場合はその言語要素コードを除去し、所定の数値範囲内
のものである場合はその言語要素コードを通過させるフ
ィルタ手段とを備えたことを特徴とする音声認識装置で
ある。According to a third aspect of the present invention, there is provided a voice input means for inputting a voice of a speaker, a voice recognition means for recognizing a phrase from a voice input from the voice input means, and a plurality of voice recognition means to be recognized in advance. A language element code storing means for storing a language element code within a predetermined numerical range corresponding to each of the words and phrases; and, when the words recognized by the speech recognition means include words to be recognized in advance, the words correspond to the respective words. A language element code output means for extracting and outputting a language element code from the language element code storage means, and a language element if the language element code output from the language element code output means is out of a predetermined numerical range. And a filter means for removing the code and passing the language element code when the code is within a predetermined numerical range.

【００１０】請求項４の本発明は、話者の音声を入力す
るための音声入力手段と、この音声入力手段から入力し
た音声から語句を認識する音声認識手段と、客からの預
り金として音声認識手段で認識されるべき複数の金額を
示す語句と各語句に対応した金額を示す言語要素コード
を記憶する言語要素コード記憶手段と、音声認識手段で
認識された語句が予め認識されるべき語句を含むとき、
各語句に対応する言語要素コードを言語要素コード記憶
手段から抽出して出力する言語要素コード出力手段と、
外部から代金のデータを取込み、言語要素コード出力手
段から出力された言語要素コードが示す金額が、外部か
ら取込んだデータの代金よりも低い場合にその言語要素
コードを除去し、外部から取込んだデータの代金以上で
ある場合にその言語要素コードを通過させるフィルタ手
段とを備えたことを特徴とする音声認識装置である。According to a fourth aspect of the present invention, there is provided a voice input means for inputting a voice of a speaker, a voice recognition means for recognizing a phrase from the voice input from the voice input means, and a voice as a deposit from a customer. Language element code storage means for storing words indicating a plurality of amounts to be recognized by the recognizing means and language element codes indicating the amounts corresponding to the respective words, and words to be recognized in advance by the speech recognition means When including
Language element code output means for extracting and outputting a language element code corresponding to each phrase from the language element code storage means,
When the price data is fetched from outside and the amount indicated by the language element code output from the language element code output means is lower than the price of the data fetched from outside, the language element code is removed and fetched from outside Filter means for passing the language element code when the price is equal to or more than the price of the data.

【００１１】請求項５の本発明は、話者の音声を入力す
るための音声入力手段と、この音声入力手段から入力し
た音声から語句を認識する音声認識手段と、客からの預
り金として音声認識手段で認識されるべき複数の金額を
示す語句と各語句に対応した金額を示す言語要素コード
を記憶する言語要素コード記憶手段と、音声認識手段で
認識された語句が予め認識されるべき語句を含むとき、
各語句に対応する言語要素コードを言語要素コード記憶
手段から抽出して出力する言語要素コード出力手段と、
外部から代金のデータを取込み、言語要素コード出力手
段から出力された言語要素コードが示す金額が、外部か
ら取込んだデータの代金に基づいて導き出される条件を
満たさない場合はその言語要素コードを除去し、条件を
満たす場合はその言語要素コードを通過させるフィルタ
手段とを備えたことを特徴とする音声認識装置である。According to a fifth aspect of the present invention, there is provided a voice input unit for inputting a voice of a speaker, a voice recognition unit for recognizing a phrase from the voice input from the voice input unit, and a voice as a deposit from a customer. Language element code storage means for storing words indicating a plurality of amounts to be recognized by the recognizing means and language element codes indicating the amounts corresponding to the respective words, and words to be recognized in advance by the speech recognition means When including
Language element code output means for extracting and outputting a language element code corresponding to each phrase from the language element code storage means,
The price data is taken in from the outside, and if the amount indicated by the language element code output from the language element code output means does not satisfy the conditions derived based on the price of the data taken in from the outside, the language element code is removed. And a filter means for passing the language element code when the condition is satisfied.

【００１２】請求項６の本発明は、話者の音声を入力す
るための音声入力手段と、この音声入力手段から入力し
た音声から語句を認識する音声認識手段と、予め認識さ
れるべき複数の語句と各語句に対応した言語要素コード
を記憶する言語要素コード記憶手段と、言語要素コード
に関連づけられた商品情報を集めて構成された商品情報
記憶手段と、音声認識手段で認識された語句が予め認識
されるべき語句を含むとき、各語句に対応する言語要素
コードを言語要素コード記憶手段から抽出して出力する
言語要素コード出力手段と、この言語要素コード出力手
段から出力された言語要素コードをキーとして商品情報
記憶手段から商品情報を検索し、該当する商品情報がな
い場合はその言語要素コードを除去し、該当する商品情
報がある場合はその言語要素コードを通過させるフィル
タ手段とを備えたことを特徴とする音声認識装置であ
る。According to a sixth aspect of the present invention, there is provided a voice input means for inputting a voice of a speaker, a voice recognition means for recognizing a phrase from the voice input from the voice input means, and a plurality of voice recognition means to be recognized in advance. Language element code storage means for storing words and language element codes corresponding to the respective words; product information storage means configured by collecting product information associated with the language element codes; and words recognized by the speech recognition means. A language element code output means for extracting and outputting a language element code corresponding to each word from the language element code storage means when including a word to be recognized in advance; and a language element code output from the language element code output means The product information is retrieved from the product information storage means using the key as a key. If there is no corresponding product information, the language element code is removed. A speech recognition apparatus characterized by comprising a filter means for passing the language elements code.

【００１３】[0013]

【発明の実施の形態】以下、本発明を電子式キャッシュ
レジスタ、ＰＯＳ端末などの商品販売コード登録処理な
どを行う業務処理装置に適用した場合の第１の実施の形
態を図１ないし図３を参照して説明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, a first embodiment in which the present invention is applied to a business processing device such as an electronic cash register, a POS terminal or the like which performs merchandise sales code registration processing will be described with reference to FIGS. It will be described with reference to FIG.

【００１４】図１は、本実施の形態にかかる業務処理装
置の構成を示す機能ブロック図である。この業務処理装
置は、音声をアナログ信号として入力するマイク１１ａ
とこのマイク１１ａからの音声をデジタル信号に変換す
るＡ／Ｄ変換器１１ｂを備える音声入力部（音声入力手
段）１１、予め認識されるべき語句に対して定義された
（関連づけられた）言語要素コードの集合体である言語
要素コード記憶手段としての音声認識リソース（言語要
素コード記憶手段）１２、音声入力部１１からの出力に
基づいて、入力した音声に対応する語句を認識し（音声
認識手段）、その語句に対応する言語要素コードを音声
認識リソース１２から抽出して出力（言語要素コード出
力手段）する音声認識部１３、音声認識部１３で抽出し
た言語要素コードのうち、所定の規則に基づいて必要な
言語要素コードのみを出力するフィルタ部（フィルタ手
段）１４、このフィルタ部１４からの言語要素コードを
利用するアプリケーションプログラム１５から構成され
る。FIG. 1 is a functional block diagram showing the configuration of a business processing device according to the present embodiment. This business processing device includes a microphone 11a for inputting audio as an analog signal.
And an audio input unit (audio input means) 11 including an A / D converter 11b for converting audio from the microphone 11a into a digital signal, a language element defined (associated) with a phrase to be recognized in advance. A speech recognition resource (language element code storage unit) 12 as a language element code storage unit, which is a set of codes, and a phrase corresponding to the input speech is recognized based on an output from the voice input unit 11 (speech recognition unit). ), A speech recognition unit 13 that extracts a language element code corresponding to the phrase from the speech recognition resource 12 and outputs (language element code output means). Among the language element codes extracted by the speech recognition unit 13, A filter unit (filter means) 14 for outputting only necessary language element codes based on the language element codes from the filter unit 14; It consists of Deployment program 15.

【００１５】上記音声認識リソース１２は、例えばハー
ドディスク装置などの記憶装置で構成される。具体的に
は図２に示すような音声認識されるべき語句と、これら
に対応させた言語要素コードとから構成される。この言
語要素コードはすべて４桁のコードであって、音声認識
されるべき語句を「商品名」と「個数」に分類し、各分
類ごとに規則的にコードを対応させる。ここでは、所定
の規則として、「商品名」については下位２桁で表現す
るとともに上位２桁は０１とし、「個数」については下
位２桁で表現するとともに上位２桁はすべて００とする
場合を例として挙げている。例えば、「商品Ａ」の語句
には「０１０１」の言語要素コードを対応させ、「１
個」の語句に対しては「０００１」の言語要素コードを
対応させる。The speech recognition resource 12 is constituted by a storage device such as a hard disk device. Specifically, it is composed of words to be recognized as shown in FIG. 2 and language element codes corresponding to the words and phrases. The language element codes are all four-digit codes. Words to be voice-recognized are classified into "product names" and "quantities", and the codes are regularly associated with each classification. Here, as a predetermined rule, a case where the “product name” is expressed by the lower two digits and the upper two digits are set to 01, and the “number” is expressed by the lower two digits and all the upper two digits are set to 00 This is given as an example. For example, the language element code “0101” is associated with the phrase “product A”, and “1”
A language element code of “0001” is associated with the phrase “individual”.

【００１６】なお、図示はしないが音声認識されるべき
語句については、予め標準話者の音声特徴データを関連
づけて記憶しておく（不特定話者対応型）。但し、使用
者に実際に発声してもらった音声特徴データを関連づけ
ておいてもよい（特定話者対応型）。Although not shown, the words to be speech-recognized are stored in advance in association with the speech characteristic data of the standard speaker (unspecified speaker-compatible type). However, the voice feature data actually uttered by the user may be associated (specific speaker correspondence type).

【００１７】また、上記音声認識部１３、フィルタ部１
４、及びアプリケーションプログラム１５は、ＣＰＵ
（中央処理装置）・ＲＯＭ（リード・オンリ・メモリ）
・ＲＡＭ（ランダム・アクセス・メモリ）を備えたパー
ソナルコンピュータなどから構成される。これら音声認
識部１３、フィルタ部１４、及びアプリケーションプロ
グラム１５は、具体的には例えばハードディスク装置な
どの記憶装置又はＲＯＭなどのメモリに記憶され、上記
パーソナルコンピュータのＣＰＵが読取可能なソフトウ
エアプログラムで構成される。The voice recognition unit 13 and the filter unit 1
4 and the application program 15 are CPU
(Central processing unit) · ROM (read only memory)
-It is composed of a personal computer having a RAM (random access memory). The voice recognition unit 13, the filter unit 14, and the application program 15 are specifically configured by a software program stored in a storage device such as a hard disk device or a memory such as a ROM and readable by the CPU of the personal computer. Is done.

【００１８】このうち、音声認識部１３は、上記音声入
力部１１からの出力に基づいて入力された音声と予め音
声認識リソース１２内で音声特徴データを定義（対応）
させた語句との類似性・近似性を検出（例えば音声認識
リソース１２に同一の語句を意味する複数種類の言回し
の音声特徴データを同一の語句に対応させておき、これ
に基づいて入力された音声の認識を行って発声された語
句を特定）して音声認識を行い、音声認識して得られた
語句に対応する言語要素コードを音声認識リソース１２
から抽出して出力する。The speech recognition unit 13 defines (corresponds to) speech input based on the output from the speech input unit 11 and speech feature data in the speech recognition resource 12 in advance.
Detecting similarity / approximation with the word (for example, making the speech recognition resource 12 correspond to the same word with speech feature data of a plurality of types of phrases meaning the same word, and input based on this. The speech recognition is performed to identify the uttered phrase), and the speech recognition is performed. The language element code corresponding to the phrase obtained by the speech recognition is stored in the speech recognition resource 12.
Extract from and output.

【００１９】また、上記フィルタ部１４は、音声認識部
１３で抽出した言語要素コードのうち、所定の規則に基
づいて不要な言語要素コードを取除き、必要な言語要素
コードのみをアプリケーションプログラム１５へ渡すも
のである。The filter unit 14 removes unnecessary language element codes from the language element codes extracted by the speech recognition unit 13 based on a predetermined rule, and sends only necessary language element codes to the application program 15. To pass.

【００２０】本実施の形態では、１回の音声認識処理に
おいて、音声認識部１３から「商品名」と「個数」の言
語要素コードが１つずつ出力されるのが正常であるた
め、それ以外の言語要素コードが出力された場合や言語
要素コードが足りない場合には言語要素コードを出力し
ないようにする必要がある。つまり、この場合は、１回
の音声認識処理において、音声認識部１３から「商品
名」と「個数」の言語要素コードのいずれか一方のみし
か出力されなければ、その言語要素コードはアプリケー
ションプログラム１５においては不要なコードであるた
め、これを除去する必要がある。In the present embodiment, it is normal for the speech recognition unit 13 to output one language element code of "product name" and one "number" in one speech recognition process. When the language element code is output or when the language element code is insufficient, it is necessary not to output the language element code. That is, in this case, if only one of the “product name” and “number” language element codes is output from the voice recognition unit 13 in one voice recognition process, the language element code is output from the application program 15. Is unnecessary code, it is necessary to remove it.

【００２１】そこで、フィルタ部１４は、１回の音声認
識処理において各分類に割当てた桁数の言語要素コード
が１つずつ存在しない場合には、必要な言語要素コード
の組合せが足りないと判断し、言語要素コードを除去す
るという規則に基づいて言語要素コードの除去処理を行
う。Therefore, if one language element code having the number of digits assigned to each classification does not exist in one speech recognition process, the filter unit 14 determines that the required combination of language element codes is not enough. Then, the language element code removal processing is performed based on the rule of removing the language element code.

【００２２】例えば、本実施の形態では、「商品名」に
１００以上の言語要素コードを割当て、「個数」には１
００未満の言語要素コードを割当てているので、音声認
識部１３からの複数の言語要素コードの組合せとして、
１００以上の言語要素コードと１００未満の言語要素コ
ードが１つずつである場合以外は、言語要素コードを除
去するようにする。これにより、不要な言語要素コード
がアプリケーションプログラム１５に渡されることはな
い。For example, in the present embodiment, 100 or more language element codes are assigned to “product name”, and 1 is assigned to “number”.
Since a language element code of less than 00 is assigned, as a combination of a plurality of language element codes from the speech recognition unit 13,
Unless there is one language element code of 100 or more and one language element code of less than 100, the language element code is removed. Thus, unnecessary language element codes are not passed to the application program 15.

【００２３】上記アプリケーションプログラム１５は、
フィルタ部１４を通過した言語要素コードに基づいて商
品販売コードの登録、代金の計算などの所定の業務処理
を行うソフトウエアプログラムで構成される。商品名と
個数を発声すると、言語要素コードが作成されてフィル
タ部１４を通過して出力されるが、アプリケーションプ
ログラム１５は、この言語要素コードによって対応する
商品名をディスプレイなどの画面に選択表示し、商品コ
ードの登録や代金の計算などその後の会計処理を実施す
るためのものである。The application program 15 is:
It is composed of a software program that performs predetermined business processing such as registration of a product sales code and calculation of a price based on the language element code that has passed through the filter unit 14. When the product name and the number are uttered, a language element code is created and output through the filter unit 14. The application program 15 selects and displays the corresponding product name on a screen such as a display using the language element code. , And to execute subsequent accounting processes such as registration of a product code and calculation of a price.

【００２４】このような構成の本発明の実施の形態にお
いては、例えば本装置の使用者が１回の音声認識処理に
おいて、図３（ａ）に示すように「商品Ａが３個」と発
声すると、この音声は音声入力部１１でデジタル信号に
変換されて音声認識部１３に供給される。そして、音声
認識部１３で音声認識リソース１２が参照され、入力さ
れた音声と予め音声認識リソース内で定義された語句と
の類似性・近似性が検出され、「商品Ａ」に対しては
「０１０１」なる言語要素コードが出力され、「３個」
に対しては「０００３」なる言語要素コードが出力され
る。これらの言語要素コードは次のフィルタ部１４に渡
される。すると、フィルタ部１４では、１００以上の言
語要素コードと１００未満の言語要素コードが１つずつ
あるので、正常に音声認識されたと判断され、これらの
言語要素コードはアプリケーションプログラム１５へ渡
される。In the embodiment of the present invention having such a configuration, for example, in one speech recognition process, the user of the present apparatus utters "3 products A" as shown in FIG. Then, the voice is converted into a digital signal by the voice input unit 11 and supplied to the voice recognition unit 13. Then, the speech recognition unit 13 refers to the speech recognition resource 12 to detect the similarity / approximation between the input speech and a phrase defined in advance in the speech recognition resource. The language element code “0101” is output, and “3”
Is output as a language element code "0003". These language element codes are passed to the next filter unit 14. Then, the filter unit 14 determines that the speech recognition has been normally performed because there is one language element code of 100 or more and one language element code of less than 100, and these language element codes are passed to the application program 15.

【００２５】これに対して、本装置の使用者が１回の音
声認識処理において、図３（ｂ）に示すように「商品Ａ
が××」（××は「個数」とは無関係な単語）と発声す
ると、この音声は音声入力部１１でデジタル信号に変換
されて音声認識部１３に供給される。そして、音声認識
部１３で音声認識リソース１２が参照され、入力された
音声と予め音声認識リソース内で定義された語句との類
似性・近似性が検出され、「商品Ａ」に対しては「０１
０１」なる言語要素コードが出力され、「××」に対し
ては音声認識リソース１２で定義されていない単語であ
るので、これに対する言語要素コードは出力されない。
すると、フィルタ部１４では、１００未満の言語要素コ
ードが１つ足りないので、正常に音声認識されなかった
と判断され、言語要素コードは除去される。従って、こ
の場合には言語要素コードがアプリケーションプログラ
ム１５に渡されることはない。On the other hand, in one speech recognition process, the user of the present apparatus uses "Product A" as shown in FIG.
Is XX (where XX is a word irrelevant to the “number”), this voice is converted into a digital signal by the voice input unit 11 and supplied to the voice recognition unit 13. Then, the speech recognition unit 13 refers to the speech recognition resource 12 to detect the similarity / approximation between the inputted speech and a phrase defined in advance in the speech recognition resource. 01
A language element code of “01” is output. Since “XX” is a word that is not defined in the speech recognition resource 12, no language element code is output.
Then, the filter unit 14 determines that the speech recognition was not performed normally because one language element code less than 100 is insufficient, and removes the language element code. Therefore, in this case, the language element code is not passed to the application program 15.

【００２６】このように、音声認識部１３からの言語要
素コードが、「商品」及び「個数」について１つずつあ
る場合にのみ、それらの言語要素コードをアプリケーシ
ョンプログラム１５に渡すようなフィルタ部１４を設け
たので、客と店員との会話から音声認識を行っても、ア
プリケーションプログラム１５では不要な語句、例えば
挨拶や世間話、雑音などについては、たとえ誤認識され
て不要な言語要素コードが音声認識部１３から出力され
たとしても、フィルタ部１４によって除去され、アプリ
ケーションプログラム１５には渡されないため、アプリ
ケーションプログラム１５の誤作動を防止することがで
きる。従って、話者の会話の中から必要な語句だけを音
声認識するような場合に使用しても、音声を正確にコー
ド化できる音声認識装置を提供することができる。As described above, only when there is one language element code from the speech recognition unit 13 for each of "commodity" and "quantity", the filter unit 14 passes those language element codes to the application program 15. Therefore, even if speech recognition is performed from a conversation between a customer and a clerk, unnecessary words and phrases such as greetings, small talk, and noise are not recognized by the application program 15 even if an unnecessary language element code is erroneously recognized. Even if output from the recognition unit 13, it is removed by the filter unit 14 and is not passed to the application program 15, so that malfunction of the application program 15 can be prevented. Therefore, it is possible to provide a speech recognition device that can accurately code speech even when used in a case where only necessary words and phrases are recognized from a speaker's conversation.

【００２７】また、音声認識部１３では、予め定義され
ている音声認識されるべき語句の中から最も近いものを
選択するが、不要な言語要素コードであればフィルタ部
１４で除去することができるので、音声認識されるべき
語句について使用者の特徴的な言回しのすべてを音声認
識リソース１２に定義しなくても、アプリケーションプ
ログラム１５の誤動作を防止できる。なお、以降の実施
の形態においても不要な言語要素コードをフィルタ部１
４で除去することができるので、この効果を奏すること
ができる。The speech recognition unit 13 selects the closest phrase from the predefined words to be recognized, but the filter unit 14 can remove unnecessary language element codes. Therefore, it is possible to prevent the application program 15 from malfunctioning without defining all of the user's characteristic wording of the phrase to be speech-recognized in the speech recognition resource 12. In the following embodiments, unnecessary language element codes are also added to the filter unit 1.
4, the effect can be obtained.

【００２８】次に、本発明を電子式キャッシュレジス
タ、ＰＯＳ端末などの商品販売コード登録処理などを行
う業務処理装置に適用した場合の第２の実施の形態を図
４を参照して説明する。なお、本実施の形態における業
務処理装置の機能ブロック図、音声認識リソース１２の
構成図は、それぞれ図１、図２に示すものと同様でるた
め、その詳細な説明を省略する。Next, a second embodiment in which the present invention is applied to a business processing device such as an electronic cash register, a POS terminal or the like which performs a merchandise sales code registration process will be described with reference to FIG. The functional block diagram of the business processing device and the configuration diagram of the speech recognition resource 12 according to the present embodiment are the same as those shown in FIGS. 1 and 2, respectively, and therefore, detailed description thereof will be omitted.

【００２９】本実施の形態におけるフィルタ部１４は、
音声認識部１３からの言語要素コードが予め割当てられ
ている数値範囲外のときにその言語要素コードを除去す
る点で、上記第１の実施の形態におけるフィルタ部１４
と異なる。The filter section 14 in the present embodiment is
When the language element code from the voice recognition unit 13 is out of the numerical range assigned in advance, the language element code is removed.
And different.

【００３０】つまり、音声認識リソース１２において、
使用される言語要素コードの数値範囲は予め決められて
いる（例えば図２に示すものであれば、「商品」につい
ての数値範囲は「０１０１」〜「０１０３」であり、
「個数」についての数値範囲は「０００１」〜「０００
３」である）ので、このような数値範囲外の言語要素コ
ードが音声認識部１３のエラーなどによって出力された
ときに、その不要な言語要素コードを除去してアプリケ
ーションプログラム１５に渡さないようにするものであ
る。That is, in the speech recognition resource 12,
The numerical range of the language element code to be used is predetermined (for example, in the case of the one shown in FIG. 2, the numerical range of “product” is “0101” to “0103”,
The numerical range for “number” is “0001” to “000”.
3 "), when such a language element code out of the numerical range is output due to an error of the voice recognition unit 13 or the like, the unnecessary language element code is removed so as not to be passed to the application program 15. Is what you do.

【００３１】このような構成の本発明の実施の形態にお
いては、例えば本装置の使用者が図４（ａ）に示すよう
に「商品Ａが３個」と発声すると、この音声は音声入力
部１１でデジタル信号に変換されて音声認識部１３に供
給される。そして、音声認識部１３で音声認識リソース
１２が参照され、入力された音声と予め音声認識リソー
ス内で定義された語句との類似性・近似性が検出され、
「商品Ａ」に対しては「０１０１」なる言語要素コード
が出力され、「３個」に対しては「０００３」なる言語
要素コードが出力される。これらの言語要素コードは次
のフィルタ部１４に渡される。すると、フィルタ部１４
では、これらの言語要素コードは予め割当てられた数値
範囲であると判断され、アプリケーションプログラム１
５へ渡される。In the embodiment of the present invention having such a configuration, for example, when the user of the apparatus utters "3 products A" as shown in FIG. At 11, it is converted into a digital signal and supplied to the voice recognition unit 13. Then, the speech recognition unit 13 refers to the speech recognition resource 12, and detects the similarity / approximation between the input speech and a phrase defined in advance in the speech recognition resource,
A language element code “0101” is output for “product A”, and a language element code “0003” is output for “3”. These language element codes are passed to the next filter unit 14. Then, the filter unit 14
In these cases, it is determined that these language element codes are in a numerical range assigned in advance, and the application program 1
Handed over to 5.

【００３２】これに対して、本装置の使用者が図４
（ｂ）に示すように「××が３個」（××は音声認識リ
ソースに登録していない「商品」）と発声すると、この
音声は音声入力部１１でデジタル信号に変換されて音声
認識部１３に供給される。そして、音声認識部１３で音
声認識リソース１２が参照され、入力された音声と予め
音声認識リソース内で定義された語句との類似性・近似
性が検出され、「３個」に対しては「０００３」なる言
語要素コードが出力され、「××」に対しては音声認識
リソース１２で定義されていない単語であるので、例え
ば音声認識部１３の誤認識によって未定義の「０１０
５」なる言語要素コードが出力されたとすると、フィル
タ部１４では、予め割当てられている数値範囲外の「０
１０５」なる言語要素コードが除去され、予め割当てら
れている数値範囲内にある「０００３」なる言語要素コ
ードだけがアプリケーションプログラム１５に渡され
る。On the other hand, the user of this apparatus
As shown in (b), when "xx is three" (xx is "product" not registered in the voice recognition resource), this voice is converted into a digital signal by the voice input unit 11 and voice recognition is performed. It is supplied to the unit 13. Then, the speech recognition unit 13 refers to the speech recognition resource 12 to detect the similarity / approximation between the input speech and a phrase defined in advance in the speech recognition resource. 0003 "is output, and" xx "is a word that is not defined in the speech recognition resource 12, so that" 010 "is undefined due to erroneous recognition by the speech recognition unit 13, for example.
Assuming that the language element code “5” is output, the filter unit 14 outputs “0” outside the numerical range that is assigned in advance.
The language element code of “105” is removed, and only the language element code of “0003” within the numerical range that is assigned in advance is passed to the application program 15.

【００３３】このように、音声認識部１３からの言語要
素コードが、「商品」及び「個数」について１つずつあ
る場合にのみ、それらの言語要素コードをアプリケーシ
ョンプログラム１５に渡すようなフィルタ部１４を設け
たので、客と店員との会話から音声認識を行っても、ア
プリケーションプログラム４では不要な語句、例えば挨
拶や世間話、雑音などについては、たとえエラーなどに
よって誤認識されて不要な言語要素コードが音声認識部
１３から出力されたとしても、フィルタ部１４によって
除去され、アプリケーションプログラム１５には渡され
ないため、アプリケーションプログラム１５の誤作動を
防止することができる。従って、上記第１の実施の形態
と同様に、話者の会話の中から必要な語句だけを音声認
識するような場合に使用しても、音声を正確にコード化
できる音声認識装置を提供することができる。As described above, only when there is one language element code from the speech recognition unit 13 for "product" and "quantity", the filter unit 14 passes those language element codes to the application program 15. Therefore, even if speech recognition is performed from a conversation between a customer and a clerk, unnecessary words and phrases such as greetings, small talk, and noise are not recognized by the application program 4 even if they are erroneously recognized due to an error. Even if the code is output from the speech recognition unit 13, the code is removed by the filter unit 14 and is not passed to the application program 15, so that the malfunction of the application program 15 can be prevented. Therefore, similarly to the first embodiment, a speech recognition apparatus capable of accurately encoding speech even when used for speech recognition of only a necessary phrase from a speaker's conversation is provided. be able to.

【００３４】次に、本発明を電子式キャッシュレジス
タ、ＰＯＳ端末などで客からの預り金の処理を行うなど
を行う業務処理装置に適用した場合の第３の実施の形態
を図５ないし図７を参照して説明する。なお、上記第１
の実施の形態と同一部分には同一符号を付して詳細な説
明を省略する。Next, a third embodiment in which the present invention is applied to a business processing apparatus for processing a deposit from a customer using an electronic cash register, a POS terminal, or the like will be described with reference to FIGS. This will be described with reference to FIG. In addition, the first
The same reference numerals are given to the same portions as those of the embodiment, and the detailed description is omitted.

【００３５】本実施の形態にかかる業務処理装置は、図
５に示すように音声入力部１１、音声認識リソース１
２′、音声認識部１３、フィルタ部１４、このフィルタ
部１４を通過した言語要素コードを利用するアプリケー
ションプログラム１５から構成され、図１に示すものと
異なるのは、フィルタ部１４がアプリケーションプログ
ラム１５から商品の合計金額（代金）のデータを受取り
可能な点である。As shown in FIG. 5, the business processing device according to the present embodiment includes a voice input unit 11 and a voice recognition resource 1.
2 ', a speech recognition unit 13, a filter unit 14, and an application program 15 using language element codes passed through the filter unit 14. The difference from the one shown in FIG. The point is that data of the total price (price) of the product can be received.

【００３６】また、本実施の形態における音声認識リソ
ース１２′は、図６に示すように金額データから構成さ
れる点で、第１の実施の形態とは異なる。例えば、「千
円」の語句には「１０００」の言語要素コードを対応さ
せ、「三千円」の語句に対しては「３０００」の言語要
素コードを対応させて記憶する。Further, the speech recognition resource 12 'in the present embodiment differs from the first embodiment in that the speech recognition resource 12' is constituted by money data as shown in FIG. For example, a language element code of “1000” is associated with a phrase of “1,000 yen”, and a language element code of “3000” is associated with a word of “3,000 yen” and stored.

【００３７】ところで、客からの預り金の処理を行う業
務処理装置では、アプリケーションプログラム１５が、
既に販売された商品の合計金額に関するデータをもって
おり、この合計金額より少ない預り金を客から預ること
は通常では考えられない。従って、商品の合計金額より
少ない預り金の言語要素コードが音声認識部１３から出
力された場合は、誤認識したものと判断して、その言語
要素コードを除去することによって、不要な言語要素コ
ードをアプリケーションプログラム１５に渡すことを防
止できる。By the way, in a business processing device for processing a deposit from a customer, the application program 15
It has data on the total amount of goods already sold, and it is not usually conceivable to deposit less than this total amount from customers. Therefore, when the language element code of the deposit less than the total price of the product is output from the voice recognition unit 13, it is determined that the language element code is erroneously recognized, and the unnecessary language element code is removed by removing the language element code. To the application program 15 can be prevented.

【００３８】このような原理に基づいて、本実施の形態
おけるフィルタ部１４は、アプリケーションプログラム
１５から受取った商品の合計金額を言語要素コードと比
較し、言語要素コードが商品の合計金額未満のときに
は、その言語要素コードを除去してアプリケーションプ
ログラム１５に渡さないようにするように構成する。Based on such a principle, the filter unit 14 in the present embodiment compares the total price of the product received from the application program 15 with the language element code, and when the language element code is less than the total price of the product, , The language element code is removed so as not to be passed to the application program 15.

【００３９】このような構成の本発明の実施の形態にお
いては、例えば本装置の使用者が図７（ａ）に示すよう
に「三千円」と発声すると、この音声は音声入力部１１
でデジタル信号に変換されて音声認識部１３に供給され
る。そして、音声認識部１３で音声認識リソース１２′
が参照され、入力された音声と予め音声認識リソース内
で定義された語句との類似性・近似性が検出され、「三
千円」に対して「３０００」なる言語要素コードが出力
される。In the embodiment of the present invention having such a configuration, for example, when the user of the present apparatus utters "3,000 yen" as shown in FIG.
Is converted into a digital signal and supplied to the voice recognition unit 13. Then, the speech recognition unit 13 outputs a speech recognition resource 12 ′.
Is detected, the similarity / approximation between the input speech and the phrase defined in advance in the speech recognition resource is detected, and a language element code of “3000” is output for “3,000 yen”.

【００４０】一方、フィルタ部１４では、アプリケーシ
ョンプログラム１５からの「２５００」円なる商品の合
計金額と音声認識部１３からの言語要素コード「３００
０」とが比較される。この場合は、言語要素コードが商
品の合計金額以上となるので、正常に音声認識されたと
判断され、その言語要素コードはアプリケーションプロ
グラム１５へ渡される。On the other hand, in the filter unit 14, the total price of the product of “2500” yen from the application program 15 and the language element code “300” from the speech recognition unit 13
0 "is compared. In this case, since the language element code is equal to or more than the total price of the product, it is determined that the speech has been normally recognized, and the language element code is passed to the application program 15.

【００４１】これに対して、例えば本装置の使用者が図
７（ｂ）に示すように「×××」と発声すると、この音
声は音声入力部１１でデジタル信号に変換されて音声認
識部１３に供給される。そして、音声認識部１３で音声
認識リソース１２′が参照され、入力された音声と予め
音声認識リソース内で定義された語句との類似性・近似
性が検出され、「×××」に対して誤認識によって例え
ば「２０００」なる言語要素コードが出力されたとす
る。On the other hand, for example, when the user of the present apparatus utters “XXX” as shown in FIG. 7B, this voice is converted into a digital signal by the voice input unit 11 and 13 is supplied. Then, the speech recognition unit 13 refers to the speech recognition resource 12 ′, detects the similarity / approximation between the input speech and the phrase defined in advance in the speech recognition resource, and It is assumed that a language element code “2000” is output due to erroneous recognition.

【００４２】一方、フィルタ部１４では、アプリケーシ
ョンプログラム１５からの「２５００」円なる商品の合
計金額と音声認識部１３からの言語要素コード「２００
０」とが比較される。この場合は、言語要素コードが商
品の合計金額未満となるので、正常に音声認識されなか
ったと判断され、その言語要素コードはアプリケーショ
ンプログラム１５には渡されない。On the other hand, in the filter unit 14, the total price of the product of “2500” yen from the application program 15 and the language element code “200” from the voice recognition unit 13
0 "is compared. In this case, since the language element code is less than the total price of the product, it is determined that speech recognition has not been normally performed, and the language element code is not passed to the application program 15.

【００４３】このように、音声認識部１３からの言語要
素コードをアプリケーションプログラム１５からの商品
の合計金額と比較し、言語要素コードがその合計金額未
満でないときにのみ、その言語要素コードをアプリケー
ションプログラム１５に渡すようなフィルタ部１４を設
けたので、客と店員との会話から音声認識を行っても、
アプリケーションプログラム１５では不要な語句、例え
ば商品の合計金額からみれば通常では考えられないよう
な預り金額の音声、挨拶や世間話、雑音などについて
は、たとえ誤認識されて不要な言語要素コードが音声認
識部１３から出力されたとしても、フィルタ部１４によ
って除去され、アプリケーションプログラム１５には渡
されないため、アプリケーションプログラム１５の誤作
動を防止することができる。従って、話者の会話の中か
ら必要な語句だけを音声認識するような場合に使用して
も、音声を正確にコード化できる音声認識装置を提供す
ることができる。As described above, the language element code from the voice recognition unit 13 is compared with the total price of the product from the application program 15, and only when the language element code is not less than the total price, the language element code is compared with the application program. Since the filter unit 14 is provided so as to be passed to a customer, even if speech recognition is performed from a conversation between a customer and a clerk,
Unnecessary words in the application program 15, for example, voices of deposit amounts, greetings, small talks, noises, etc., which are not normally considered from the viewpoint of the total price of products, are recognized as erroneous language element codes. Even if output from the recognition unit 13, it is removed by the filter unit 14 and is not passed to the application program 15, so that malfunction of the application program 15 can be prevented. Therefore, it is possible to provide a speech recognition device that can accurately code speech even when used in a case where only necessary words and phrases are recognized from a speaker's conversation.

【００４４】次に、本発明を電子式キャッシュレジス
タ、ＰＯＳ端末などで客からの預り金の処理を行うなど
を行う業務処理装置に適用した場合の第４の実施の形態
を図８を参照して説明する。なお、本実施の形態におけ
る業務処理装置の機能ブロック図、音声認識リソース１
２の構成図は、それぞれ図５、図６に示すものと同様で
るため、その詳細な説明を省略する。Next, with reference to FIG. 8, a fourth embodiment in which the present invention is applied to a business processing device for processing a deposit from a customer using an electronic cash register, a POS terminal, or the like. Will be explained. Note that the functional block diagram of the business processing device in the present embodiment, the speech recognition resource 1
2 are the same as those shown in FIGS. 5 and 6, respectively, and therefore, detailed description thereof will be omitted.

【００４５】本実施の形態におけるフィルタ部１４にお
いて、上記第３の実施の形態と異なるのは、客からの預
り金額を音声認識した言語要素コードがアプリケーショ
ンプログラム１５からの商品の合計金額からみれば通常
では考えられないようなものか否かを、大小関係に基づ
いて判断する代りに、商品の合計金額からみれば通常は
満たすような条件を定め、この条件に基づいて判断する
点で異なる。The difference between the filter unit 14 of the present embodiment and the third embodiment is that the language element code obtained by voice-recognizing the deposit amount from the customer is viewed from the total price of the products from the application program 15. Instead of judging whether or not it is something that cannot be considered normally based on the magnitude relationship, a condition that is usually satisfied from the viewpoint of the total price of the product is determined, and judgment is made based on this condition.

【００４６】例えば、商品の合計金額が２５５円などの
５の倍数である場合には、預り金額としては、通常は１
００５円、５００円など５の倍数であることは考えられ
ても、１００６円ということは考えられない。このよう
なことを考慮すると、フィルタ部１４で設定する条件と
しては、例えば言語要素コード（預り金額）が５の倍数
であるとすればよい。ここでは、さらに条件を絞り込
み、その言語要素コード（預り金額）から商品の合計金
額を引いた値が５の倍数でもあるという条件を予め設定
しておく。なお、予め条件を複数用意しておき、合計金
額によって必要な条件を選択するようにしてもよい。For example, when the total price of the product is a multiple of 5 such as 255 yen, the deposit amount is usually 1
Although it is conceivable that it is a multiple of 5 such as 005 yen or 500 yen, it is not considered that it is 1006 yen. In consideration of this, the condition set by the filter unit 14 may be, for example, that the language element code (deposit amount) is a multiple of five. Here, the condition is further narrowed down, and a condition is set in advance that the value obtained by subtracting the total price of the product from the language element code (deposit amount) is also a multiple of 5. A plurality of conditions may be prepared in advance, and a necessary condition may be selected according to the total amount.

【００４７】このような構成の本発明の実施の形態にお
いては、例えば本装置の使用者が図８（ａ）に示すよう
に「千五円」と発声すると、この音声は音声入力部１１
でデジタル信号に変換されて音声認識部１３に供給され
る。そして、音声認識部１３で音声認識リソース１２が
参照され、入力された音声と予め音声認識リソース内で
定義された語句との類似性・近似性が検出され、「千五
円」に対して「１００５」なる言語要素コードが出力さ
れる。In the embodiment of the present invention having such a configuration, for example, when the user of the present apparatus utters “1000 yen” as shown in FIG.
Is converted into a digital signal and supplied to the voice recognition unit 13. Then, the speech recognition unit 13 refers to the speech recognition resource 12 to detect the similarity / approximation between the input speech and a phrase defined in advance in the speech recognition resource. The language element code “1005” is output.

【００４８】一方、フィルタ部１４では、アプリケーシ
ョンプログラム１５からの「２５５」円なる商品の合計
金額により、上述した条件が選択され、音声認識部１３
からの言語要素コード「１００５」が上述した条件を満
たすか否かが判断される。この場合は、条件を満たすの
で、正常に音声認識されたと判断され、その言語要素コ
ードはアプリケーションプログラム１５へ渡される。On the other hand, in the filter unit 14, the above-described condition is selected based on the total amount of the product of “255” yen from the application program 15, and the speech recognition unit 13
It is determined whether the language element code “1005” from satisfies the above-described condition. In this case, since the condition is satisfied, it is determined that speech recognition has been normally performed, and the language element code is passed to the application program 15.

【００４９】また、例えば本装置の使用者が図８（ｂ）
に示すように「五百円」と発声すると、この音声は音声
入力部１１でデジタル信号に変換されて音声認識部１３
に供給される。そして、音声認識部１３で音声認識リソ
ース１２が参照され、入力された音声と予め音声認識リ
ソース内で定義された語句との類似性・近似性が検出さ
れ、「五百円」に対して「５００」なる言語要素コード
が出力される。Also, for example, the user of the present apparatus is shown in FIG.
When "500 yen" is uttered as shown in FIG. 7, this voice is converted into a digital signal by the voice input unit 11 and
Supplied to Then, the speech recognition unit 13 refers to the speech recognition resource 12 to detect the similarity / approximation between the input speech and a phrase defined in advance in the speech recognition resource. The language element code "500" is output.

【００５０】一方、フィルタ部１４では、アプリケーシ
ョンプログラム１５からの「２５５」円なる商品の合計
金額により、上述した条件が選択され、音声認識部１３
からの言語要素コード「５００」が上述した条件を満た
すか否かが判断される。この場合は、条件を満たすの
で、正常に音声認識されたと判断され、その言語要素コ
ードはアプリケーションプログラム１５へ渡される。On the other hand, in the filter unit 14, the above-described condition is selected based on the total price of the product of “255” yen from the application program 15, and the voice recognition unit 13
It is determined whether or not the language element code “500” from satisfies the above condition. In this case, since the condition is satisfied, it is determined that speech recognition has been normally performed, and the language element code is passed to the application program 15.

【００５１】これに対して、例えば本装置の使用者が図
８（ｃ）に示すように「×××」と発声すると、この音
声は音声入力部１１でデジタル信号に変換されて音声認
識部１３に供給される。そして、音声認識部１３で音声
認識リソース１２が参照され、入力された音声と予め音
声認識リソース内で定義された語句との類似性・近似性
が検出され、「×××」に対して誤認識によって例えば
「１００６」なる言語要素コードが出力されたとする。On the other hand, for example, when the user of this apparatus utters “XXX” as shown in FIG. 8C, this voice is converted into a digital signal by the voice input unit 11 and 13 is supplied. Then, the speech recognition unit 13 refers to the speech recognition resource 12 and detects the similarity / approximation between the input speech and a phrase defined in advance in the speech recognition resource. It is assumed that a language element code “1006” is output by the recognition.

【００５２】一方、フィルタ部１４では、アプリケーシ
ョンプログラム１５からの「２５５」円なる商品の合計
金額により、上述した条件が選択され、音声認識部１３
からの言語要素コード「１００６」が上述した条件を満
たすか否かが判断される。この場合は、条件を満たさな
いので、正常に音声認識されなかったと判断され、その
言語要素コードはアプリケーションプログラム１５には
渡されない。On the other hand, in the filter unit 14, the above-described conditions are selected based on the total price of the product of “255” yen from the application program 15, and the speech recognition unit 13
It is determined whether the language element code “1006” from satisfies the above-described condition. In this case, since the condition is not satisfied, it is determined that the speech has not been normally recognized, and the language element code is not passed to the application program 15.

【００５３】このように、音声認識部１３からの言語要
素コードをアプリケーションプログラム１５からの商品
の合計金額に基づいて決められた所定の条件を満たして
いるときにのみ、その言語要素コードをアプリケーショ
ンプログラム１５に渡すようなフィルタ部１４を設けた
ので、客と店員との会話から音声認識を行っても、アプ
リケーションプログラム１５では不要な語句、例えば商
品の合計金額からみれば通常では考えられないような預
り金額の音声、挨拶や世間話、雑音などについては、た
とえ誤認識されて不要な言語要素コードが音声認識部１
３から出力されたとしても、フィルタ部１４によって除
去され、アプリケーションプログラム１５には渡されな
いため、アプリケーションプログラム１５の誤作動を防
止することができる。従って、上記第１の実施の形態と
同様に、話者の会話の中から必要な語句だけを音声認識
するような場合に使用しても、音声を正確にコード化で
きる音声認識装置を提供することができる。As described above, only when the language element code from the voice recognition unit 13 satisfies the predetermined condition determined based on the total price of the product from the application program 15, the language element code is Since the filter unit 14 is provided so that it can be passed to the application program 15, even if speech recognition is performed from a conversation between the customer and the clerk, the application program 15 does not normally think of unnecessary words and phrases, for example, from the viewpoint of the total price of goods. Regarding the voice of the deposit amount, greetings, small talk, noise, etc., even if an erroneous recognition and unnecessary language element code
Even if it is output from 3, the filter is removed by the filter unit 14 and is not passed to the application program 15, so that malfunction of the application program 15 can be prevented. Therefore, similarly to the first embodiment, a speech recognition apparatus capable of accurately encoding speech even when used for speech recognition of only a necessary phrase from a speaker's conversation is provided. be able to.

【００５４】なお、本実施の形態におけるフィルタ部１
４で設定する条件としては、商品の合計金額の下一桁が
０の場合、例えば１５５０円などの場合は、言語要素コ
ード（預り金額）が５０の倍数であり、かつその言語要
素コード（預り金額）から商品の合計金額を引いた値が
５０の倍数でもあるという条件を予め設定してもよい。
これにより、「５の倍数」とした場合に比して、預り金
額として通常では考えられない１５６０円なども除去す
ることができるようになる。The filter unit 1 according to the present embodiment
The condition set in 4 is that if the last digit of the total price of the product is 0, for example, 1550 yen, the language element code (deposit amount) is a multiple of 50 and the language element code (deposit amount) The condition that the value obtained by subtracting the total price of the product from the price) is also a multiple of 50 may be set in advance.
This makes it possible to remove 1560 yen, which is not normally considered as a deposit amount, as compared with the case of “multiple of 5”.

【００５５】次に、本発明を電子式キャッシュレジス
タ、ＰＯＳ端末などで客からの預り金の処理を行うなど
を行う業務処理装置に適用した場合の第５の実施の形態
を図９ないし図１１を参照して説明する。なお、本実施
の形態において、第１の実施の形態と同一部分には同一
符号を付してその詳細な説明を省略する。Next, a fifth embodiment in which the present invention is applied to a business processing device for processing a deposit from a customer using an electronic cash register, a POS terminal, or the like will be described with reference to FIGS. 9 to 11. This will be described with reference to FIG. Note that, in the present embodiment, the same portions as those in the first embodiment are denoted by the same reference numerals, and detailed description thereof will be omitted.

【００５６】本実施の形態にかかる業務処理装置は、図
９に示すように音声入力部１１、音声認識リソース１
２、音声認識部１３、フィルタ部１４、このフィルタ部
１４からの言語要素コードを利用するアプリケーション
プログラム１５の他、フィルタ部１４で音声認識部１３
からの言語要素コードを除去するか否かを判断する際に
参照する取扱い商品の商品情報を集めた商品情報データ
ベース（商品情報記憶手段）２１から構成される。As shown in FIG. 9, the business processing device according to the present embodiment includes a voice input unit 11 and a voice recognition resource 1.
2. Speech recognition unit 13, filter unit 14, application program 15 using the language element code from filter unit 14, and speech recognition unit 13
And a product information database (product information storage means) 21 which collects product information of the handled products to be referred to when determining whether to remove the language element code from the product.

【００５７】本実施の形態における音声認識部１３、フ
ィルタ部１４、及びアプリケーションプログラム１５
は、ＣＰＵ・ＲＯＭ・ＲＡＭを備えたパーソナルコンピ
ュータなどから構成される。これら音声認識部１３、フ
ィルタ部１４、及びアプリケーションプログラム１５
は、具体的には例えばハードディスク装置などの記憶装
置又はＲＯＭなどのメモリに記憶され、上記パーソナル
コンピュータのＣＰＵが読取可能なソフトウエアプログ
ラムで構成される。The voice recognition unit 13, the filter unit 14, and the application program 15 in the present embodiment
Is composed of a personal computer having a CPU, ROM and RAM. These voice recognition unit 13, filter unit 14, and application program 15
Is, for example, a software program stored in a storage device such as a hard disk device or a memory such as a ROM and readable by the CPU of the personal computer.

【００５８】上記音声認識リソース１２は、例えばハー
ドディスク装置などの記憶装置で構成される。具体的な
構成は、図２に示すものと同様であるためその詳細な説
明を省略する。The speech recognition resource 12 is constituted by a storage device such as a hard disk device. The specific configuration is the same as that shown in FIG. 2, and a detailed description thereof will be omitted.

【００５９】上記商品情報データベース２１は、例えば
ハードディスク装置などの記憶装置で構成される。具体
的には図１０に示すような取扱い商品の商品名、言語要
素コード、単価、在庫数をそれぞれ関連づけた商品情報
を集めて構成される。例えば商品名が「商品Ａ」には、
「１０１」なる言語要素コード、「１００」円なる単
価、「１０００」個なる在庫数が関連づけられている。
なお、この商品情報データベース２１の商品情報は、商
品の販売処理などによって更新されるようになってい
る。The product information database 21 is constituted by a storage device such as a hard disk device. Specifically, it is configured by collecting merchandise information in which the merchandise names, language element codes, unit prices, and stock quantities of the handled merchandise as shown in FIG. 10 are associated with each other. For example, if the product name is "Product A",
A language element code of “101”, a unit price of “100” yen, and a stock quantity of “1000” are associated.
The product information in the product information database 21 is updated by, for example, a sales process of the product.

【００６０】本実施の形態におけるフィルタ部１４は、
音声認識部１３からの言語要素コードをキーとして商品
情報データベース２１から商品情報を検索する。そし
て、該当する商品情報がないときにはその言語要素コー
ドを除去し、該当する商品情報があればその言語要素コ
ードをアプリケーションプログラム１５へ渡す。The filter section 14 in the present embodiment is
The product information is retrieved from the product information database 21 using the language element code from the voice recognition unit 13 as a key. When there is no corresponding product information, the language element code is removed, and when there is the corresponding product information, the language element code is passed to the application program 15.

【００６１】つまり、音声認識部１３で誤認識された場
合、出力される言語要素コードは実際に商品情報が存在
しないものである可能性が高い。このため、音声認識部
１３からの言語要素コードを商品情報データベース２１
で検索し、商品情報が存在する言語要素コードか否かを
判断することによって、商品情報が存在しないような不
要な言語要素コードを除去してアプリケーションプログ
ラム１５に渡さないようにするものである。That is, when the speech recognition unit 13 misrecognizes the language element code, it is highly likely that the output language element code does not actually include product information. Therefore, the language element code from the voice recognition unit 13 is stored in the product information database 21.
By determining whether or not the product information is a language element code in which the product information exists, unnecessary language element codes in which the product information does not exist are removed so as not to be passed to the application program 15.

【００６２】このような構成の本発明の実施の形態にお
いては、例えば本装置の使用者が図１１（ａ）に示すよ
うに「商品Ａ」と発声すると、この音声は音声入力部１
１でデジタル信号に変換されて音声認識部１３に供給さ
れる。そして、音声認識部１３で音声認識リソース１２
が参照され、入力された音声と予め音声認識リソース内
で定義された語句との類似性・近似性が検出され、「商
品Ａ」に対して「０１０１」なる言語要素コードが出力
される。この言語要素コードは次のフィルタ部１４に渡
される。すると、フィルタ部１４では、その言語要素コ
ードをキーとして商品情報データベース２１から商品情
報が検索される。この場合は、該当する商品情報がある
ので、その言語要素コードがアプリケーションプログラ
ム１５へ渡される。In the embodiment of the present invention having such a configuration, for example, when a user of the present apparatus utters “product A” as shown in FIG.
The signal is converted into a digital signal at 1 and supplied to the voice recognition unit 13. Then, the voice recognition resource 13
Is detected, the similarity / approximation between the input voice and the phrase defined in advance in the voice recognition resource is detected, and the language element code “0101” is output for “product A”. This language element code is passed to the next filter unit 14. Then, the filter unit 14 searches the product information database 21 for product information using the language element code as a key. In this case, since there is corresponding product information, the language element code is passed to the application program 15.

【００６３】これに対して、本装置の使用者が図１１
（ｂ）に示すように「×××」と発声すると、この音声
は音声入力部１１でデジタル信号に変換されて音声認識
部１３に供給される。そして、音声認識部１３で音声認
識リソース１２が参照され、入力された音声と予め音声
認識リソース内で定義された語句との類似性・近似性が
検出され、「××」に対して誤認識によって例えば音声
認識部１３によって未定義の「０１０５」なる言語要素
コードが出力されたとする。On the other hand, the user of this apparatus
When “XXX” is uttered as shown in (b), this voice is converted into a digital signal by the voice input unit 11 and supplied to the voice recognition unit 13. Then, the speech recognition unit 13 refers to the speech recognition resource 12, detects the similarity / approximation between the input speech and a phrase defined in advance in the speech recognition resource, and incorrectly recognizes “xx”. For example, assume that the speech recognition unit 13 outputs an undefined language element code of “0105”.

【００６４】すると、フィルタ部１４では、その言語要
素コードをキーとして商品情報データベース２１から商
品情報が検索される。この場合は、該当する商品情報が
ないので、その言語要素コードは除去され、アプリケー
ションプログラム１５へ渡されない。Then, the filter unit 14 searches the product information database 21 for product information using the language element code as a key. In this case, since there is no corresponding product information, the language element code is removed and is not passed to the application program 15.

【００６５】このように、音声認識部１３からの言語要
素コードをキーとして商品情報データベースから商品情
報を検索し、該当する商品情報があるときにのみ、その
言語要素コードをアプリケーションプログラム１５に渡
すようなフィルタ部１４を設けたので、客と店員との会
話から音声認識を行っても、アプリケーションプログラ
ム１５では不要な語句、例えば挨拶や世間話、雑音など
については、たとえ誤認識されて不要な言語要素コード
が音声認識部１３から出力されたとしても、フィルタ部
１４によって除去され、アプリケーションプログラム１
５には渡されないため、アプリケーションプログラム１
５の誤作動を防止することができる。従って、上記第１
の実施の形態と同様に、話者の会話の中から必要な語句
だけを音声認識するような場合に使用しても、音声を正
確にコード化できる音声認識装置を提供することができ
る。As described above, the product information is searched from the product information database using the language element code from the voice recognition unit 13 as a key, and the language element code is passed to the application program 15 only when there is the corresponding product information. Since the filter unit 14 is provided, even if speech recognition is performed from a conversation between a customer and a clerk, unnecessary words and phrases such as greetings, small talk, and noise are not recognized by the application program 15 even if the unnecessary language is recognized. Even if the element code is output from the voice recognition unit 13, it is removed by the filter unit 14 and the application program 1
5 is not passed to application program 1
5 can be prevented from malfunctioning. Therefore, the first
As in the case of the first embodiment, it is possible to provide a speech recognition apparatus capable of accurately encoding speech even when used in a case where only necessary words and phrases are recognized in a conversation between speakers.

【００６６】なお、本実施の形態におけるフィルタ部１
４において、音声認識部１３からの言語要素コードをキ
ーとして商品情報データベース２１から検索したとき
に、商品情報が検索できたとしても、在庫数が０である
場合には、その言語要素コードを除去するようにしても
よい。これによって、在庫数が０の商品の言語要素コー
ドがアプリケーションプログラムへ渡されることを防止
できる。The filter unit 1 according to the present embodiment
In step 4, when the product information is retrieved from the product information database 21 using the language element code from the voice recognition unit 13 as a key, if the stock quantity is 0, the language element code is removed. You may make it. As a result, it is possible to prevent the language element code of the product whose stock quantity is 0 from being passed to the application program.

【００６７】上記第１〜第５の実施の形態までは、それ
ぞれ別々に適用する場合について説明したが、これらの
実施の形態を組合わせて適用してもよい。Although the above first to fifth embodiments have been described with respect to the case where they are applied separately, these embodiments may be applied in combination.

【００６８】[0068]

【発明の効果】以上詳述したように本発明によれば、客
と店員との会話から音声認識を行っても、アプリケーシ
ョンプログラムでは不要な語句、例えば挨拶や世間話、
雑音などについては、たとえ誤認識されて不要な言語要
素コードが音声認識手段から出力されたとしても、フィ
ルタ手段によって除去され、アプリケーションプログラ
ムには渡されないため、アプリケーションプログラムの
誤作動を防止することができる。従って、話者の会話の
中から必要な語句だけを音声認識するような場合に使用
しても、音声を正確にコード化できる音声認識装置を提
供することができる。As described above in detail, according to the present invention, even when speech recognition is performed from a conversation between a customer and a clerk, unnecessary words and phrases such as greetings and small talks in an application program can be obtained.
Regarding noise, even if an unnecessary language element code is erroneously recognized and output from the speech recognition means, it is removed by the filter means and is not passed to the application program, so that malfunction of the application program can be prevented. it can. Therefore, it is possible to provide a speech recognition device that can accurately code speech even when used in a case where only necessary words and phrases are recognized from a speaker's conversation.

【００６９】また、音声認識手段では、予め定義されて
いる音声認識されるべき語句の中から最も近いものを選
択するが、不要な言語要素コードであればフィルタ手段
で除去することができるので、音声認識されるべき語句
について使用者の特徴的な言回しのすべてを音声認識リ
ソースに定義しなくても、アプリケーションプログラム
の誤動作を防止できる。The speech recognition means selects the closest one of the words to be recognized in advance, but unnecessary language element codes can be removed by the filter means. The malfunction of the application program can be prevented without defining all of the user's characteristic wording of the phrase to be speech-recognized in the speech recognition resource.

[Brief description of the drawings]

【図１】本発明の第１の実施の形態にかかる業務処理装
置の構成を示す機能ブロック図。FIG. 1 is a functional block diagram showing a configuration of a business processing device according to a first embodiment of the present invention.

【図２】図１に示す音声認識リソースの構成を示す図。FIG. 2 is a diagram showing a configuration of a speech recognition resource shown in FIG.

【図３】本実施の形態の作用を説明する図。FIG. 3 is a diagram illustrating an operation of the embodiment.

【図４】本発明の第２の実施の形態にかかる業務処理装
置の作用を説明する図。FIG. 4 is an exemplary view for explaining the operation of a business processing device according to a second embodiment of the present invention;

【図５】本発明の第３の実施の形態にかかる業務処理装
置の構成を示す機能ブロック図。FIG. 5 is a functional block diagram showing a configuration of a business processing device according to a third embodiment of the present invention.

【図６】図５に示す音声認識リソースの構成を示す図。FIG. 6 is a diagram showing a configuration of a speech recognition resource shown in FIG. 5;

【図７】本実施の形態の作用を説明する図。FIG. 7 is a diagram illustrating the operation of the present embodiment.

【図８】本発明の第４の実施の形態にかかる業務処理装
置の構成を示す機能ブロック図。FIG. 8 is a functional block diagram showing a configuration of a business processing device according to a fourth embodiment of the present invention.

【図９】本発明の第５の実施の形態にかかる業務処理装
置の構成を示す機能ブロック図。FIG. 9 is a functional block diagram showing a configuration of a business processing device according to a fifth embodiment of the present invention.

【図１０】図９に示す音声認識リソースの構成を示す
図。FIG. 10 is a diagram showing a configuration of a speech recognition resource shown in FIG. 9;

【図１１】本実施の形態の作用を説明する図。FIG. 11 is a diagram illustrating an operation of the present embodiment.

【図１２】従来の音声認識装置を適用した業務処理装置
の構成を示す機能ブロック図。FIG. 12 is a functional block diagram showing a configuration of a business processing device to which a conventional voice recognition device is applied.

[Explanation of symbols]

１１…音声入力部１１ａ…マイク１１ｂ…Ａ／Ｄ変換器１２…音声認識リソース１３…音声認識部１４…フィルタ部１５…アプリケーションプログラム２１…商品情報データベース DESCRIPTION OF SYMBOLS 11 ... Speech input part 11a ... Microphone 11b ... A / D converter 12 ... Speech recognition resource 13 ... Speech recognition part 14 ... Filter part 15 ... Application program 21 ... Product information database

Claims

[Claims]

1. A speech input means for inputting a voice of a speaker, a speech recognition means for recognizing a phrase from speech inputted from the speech input means, and a plurality of phrases to be recognized in advance and corresponding to each phrase. Language element code storage means for storing the obtained language element code, and when the phrase recognized by the speech recognition means includes a phrase to be recognized in advance, a language element code corresponding to each phrase is stored from the language element code storage means. Language element code output means for extracting and outputting, and filter means for removing unnecessary ones of language element codes output from the language element code output means and passing necessary ones, Voice recognition device.

2. A speech input means for inputting a voice of a speaker, a speech recognition means for recognizing a phrase from speech inputted from the speech input means, and a plurality of phrases to be recognized in advance and corresponding to each phrase. Language element code storage means for storing the obtained language element code, and when the phrase recognized by the speech recognition means includes a phrase to be recognized in advance, a language element code corresponding to each phrase is stored from the language element code storage means. A language element code output means for extracting and outputting, and at least one necessary language element code among a plurality of language element codes output from the language element code output means in one speech recognition process is not included. If necessary, filter means to remove all language element codes and, if all necessary language element codes are available, to pass those language element codes Characteristic speech recognition device.

3. A voice input means for inputting a voice of a speaker, a voice recognition means for recognizing a phrase from a voice inputted from the voice input means, and a plurality of words to be recognized in advance and a plurality of words. Language element code storage means for storing a language element code within a predetermined numerical range, and when the words recognized by the voice recognition means include words to be recognized in advance, the language element code corresponding to each word is stored in the language element code. A language element code output means for extracting and outputting the language element code from the language element code storage means, and removing the language element code if the language element code output from the language element code output means is out of a predetermined numerical range. And a filter means for passing the language element code when the value is within a predetermined numerical range.

4. A voice input means for inputting a voice of a speaker, a voice recognition means for recognizing a phrase from a voice input from the voice input means, and a voice recognition means for recognizing a deposit from a customer. Language element code storage means for storing a plurality of words to be indicated and a language element code indicating an amount of money corresponding to each word, and when the words recognized by the voice recognition means include words to be recognized in advance, A language element code output means for extracting and outputting a language element code corresponding to each word from the language element code storage means, and a language element code output from the language element code output means which takes in data of the price from the outside. If the amount shown is lower than the price of the data imported from the outside, the language element code is removed, and if the amount is equal to or more than the price of the data imported from the outside, the price is reduced. And a filter means for passing the language element code.

5. A voice input means for inputting a voice of a speaker, a voice recognition means for recognizing a phrase from a voice inputted from the voice input means, and a voice recognition means for recognizing a deposit from a customer. Language element code storage means for storing a plurality of words to be indicated and a language element code indicating an amount of money corresponding to each word, and when the words recognized by the voice recognition means include words to be recognized in advance, A language element code output means for extracting and outputting a language element code corresponding to each word from the language element code storage means, and a language element code output from the language element code output means which takes in data of the price from the outside. If the indicated amount does not satisfy the condition derived based on the price of the data taken from the outside, the language element code is removed, and if the condition is satisfied, And a filter means for passing the language element code.

6. A voice input means for inputting a voice of a speaker, a voice recognition means for recognizing a phrase from a voice input from the voice input means, and a plurality of words to be recognized in advance and corresponding to each word. Language element code storage means for storing the obtained language element code, product information storage means configured by collecting product information associated with the language element code, and the phrase recognized by the voice recognition means should be recognized in advance. A language element code output means for extracting and outputting a language element code corresponding to each word from the language element code storage means, and a language element code output from the language element code output means as a key The product information is retrieved from the product information storage means. If there is no corresponding product information, the language element code is removed. A speech recognition device comprising: a filter means for passing a raw code.