JPS60247697A

JPS60247697A - Voice recognition responder

Info

Publication number: JPS60247697A
Application number: JP59103625A
Authority: JP
Inventors: 千本　浩之; 洋一竹林
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1984-05-24
Filing date: 1984-05-24
Publication date: 1985-12-07
Also published as: JPH0518118B2

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】〔発明の技術分野〕本発明は音声入力による情報処理システムに用いられる
音声認識応答装置に関する。DETAILED DESCRIPTION OF THE INVENTION [Technical Field of the Invention] The present invention relates to a voice recognition response device used in an information processing system using voice input.

[Technical background of the invention and its problems]

近年、音声認識、合成技術の発達が目覚ましく、例えば
連続音％認識や不特定話者を対象とした音声ｍ識が可能
となシ、また一方、精度の高い音声合成が可能となって
いる。In recent years, the development of speech recognition and synthesis technology has been remarkable, and it has become possible, for example, to recognize continuous speech percentages and to recognize speech patterns for unspecified speakers, and on the other hand, to perform highly accurate speech synthesis.

この様な技術を用いて電話公衆回線による各種のサービ
スを行なう電話音声応答サービス、例えは銀行における
預金残高の照会等が關発されておシ、その有用性が注目
されている。ところでこの穐のシステムのユーザは不特
定多数であシ、例えば老人、子供のようにシステムに不
慣れな人もいれば１日に細口も利用する熟練した人もい
る。これにもかかわらず、従来のシステムでは音声応答
の内容（様式）が固定的であり、又ユーザが音声を入力
した時から音声応答が出力されるまでの時間や音声応答
の速度も一定である為、全てのユーザにとって扱い易い
ものとは云えず、人間と機械との対話が円滑になされて
いなかった。例えば電話による銀行の預金残高照会サー
ビスにおいては、ユーザが口座番号「１２３・・・」を
電話口で音声入力する場合に「ピー」という入力要求信
号音が聞えると先ず「１」と云う。すると１０秒程度経
った後に「１」というＮ認のための音声応答が聞える。Telephone voice response services that use such technology to provide various services over telephone public telephone lines, such as bank account balance inquiries, have been discussed, and their usefulness is attracting attention. By the way, the users of this system are an unspecified number of people; for example, there are people who are inexperienced with the system, such as elderly people and children, and there are also people who are experienced and use the system even if they use it every day. Despite this, in conventional systems, the content (format) of the voice response is fixed, and the time from when the user inputs voice until the voice response is output and the speed of the voice response are also constant. Therefore, it could not be said that it was easy for all users to use, and interaction between humans and machines was not smooth. For example, in a bank account balance inquiry service by telephone, when a user inputs an account number "123..." by voice over the phone, when an input request signal sound of "beep" is heard, the user first says "1". Then, after about 10 seconds, you will hear a voice response of "1" for N confirmation.

続いてユーザが「２」と云う。・・という様にこの種の
システムに慣れたユーザにとっては応答時間が冗長で苛
立しさが生じたシ、或いは慣れていないユーザにとって
は応答内容が分シ難いという欠点があった。Then the user says "2". For users who are accustomed to this type of system, the response time is lengthy and irritating, and for users who are not accustomed to this type of system, it is difficult to understand the contents of the responses.

[Purpose of the invention]

本発明の目的は、人間と機械との対話を円滑に行なうこ
とが可能となる音声認識応答装置を提供することにある
。An object of the present invention is to provide a voice recognition response device that enables smooth interaction between humans and machines.

[Summary of the invention]

本発明は、音声信号を入力するための入力手段と、この
入力手段より入力された音声信号を音声パターンとして
分析するとともに音声応答信号を検出する音声検出手段
と、この音声検出手段によシ分相された音声パターンを
認識する音声認識手段と、この音声ｉ！識手段より出力
された音声パターンのｇ識結果に基いて音声応答信号を
出力するとともに音声入力要求信号を出力する音声出力
手段と、この音声出力手段より出力された音声入力要求
信号及び前記音声検出手段よシ出力された音声区間信号
の時間データを測定する測定手段とを備え、前記音声出
力手段は前記測定手段より出力された時間データに基き
音声応答信号を制御して出力することを特徴とするもの
である。The present invention includes an input means for inputting a voice signal, a voice detection means for analyzing the voice signal input from the input means as a voice pattern and detecting a voice response signal, and a voice detection means. A voice recognition means for recognizing matched voice patterns, and this voice i! a voice output means for outputting a voice response signal and a voice input request signal based on the g recognition result of the voice pattern output from the voice recognition means; and a voice input request signal output from the voice output means and the voice detection. A measuring means for measuring time data of the voice section signal outputted by the means, and the voice output means controls and outputs the voice response signal based on the time data outputted from the measuring means. It is something to do.

〔Effect of the invention〕

本発明によれば、各ユーザに応じて適切な応答を与える
ことが可能になる為に、人間と機械の対話を円滑に行う
ことが出来、ユーザにとっては実用性が向上する。According to the present invention, since it becomes possible to give an appropriate response according to each user, interaction between a human and a machine can be carried out smoothly, and practicality for the user is improved.

[Embodiments of the invention]

以下、図面を参照しながら本発明の実施例について説明
する。Embodiments of the present invention will be described below with reference to the drawings.

第１図は本発明の第１の実施例の概略構成図、第２図は
音声入力要求Ｐと音声入力へと音声応答へのタイミング
を模式化した模式図、第３図は本゛発明の第１の実施例
の処理フロー図である。この第１の実施例は音声入力要
求が出力されてからユーザにより音声が入力されるまで
の時間を測定し、その時間に応じて音声応答を制御して
出力するものである。第１図の点線内の各ブロックは音
声認識応答装置を構成し、この入出力は図示しないサー
ビス端末に接続されている。例えはユーザがサービス端
末である電話器よシ所定の電話番号を入力すると受話器
を通して［預金残高照会サービスを行ないます。ピーと
いう信号音が関えたら口座番号を１つずつ順に答えて下
さい」という応答が音声応答出力部７よシ送られる。こ
の時点からの音声入力と応答のヤシとシに本発明が適用
される。FIG. 1 is a schematic configuration diagram of the first embodiment of the present invention, FIG. 2 is a schematic diagram illustrating the timing of voice input request P and voice input to voice response, and FIG. 3 is a schematic diagram of the first embodiment of the present invention. FIG. 3 is a processing flow diagram of the first embodiment. In this first embodiment, the time from when a voice input request is output to when voice is input by the user is measured, and the voice response is controlled and output according to the measured time. Each block within the dotted line in FIG. 1 constitutes a voice recognition response device, whose input and output are connected to a service terminal (not shown). For example, when a user enters a predetermined phone number on a telephone, which is a service terminal, a bank account inquiry service is performed through the handset. When the beep signal sounds, please answer the account numbers one by one.'' is sent to the voice response output unit 7. The present invention is applied to voice input and response from this point onwards.

先ず第１図の音声入力開始要求ｓ８より音声入力要求信
号Ｐが図示しない端末へ出力されるが、これは同時にタ
イミング測定部３にも送られる（第３図ステップ１１）
。タイミング測定部３はこの音声入力要求信号Ｐを入力
した時点を測定する。ユーザ側では音声入力要求信号で
ある「ピー」という信号音を聞く゛と、「１」という音
声を受話器よシ入力する（第３図ステップ１２）。この
入力音声鳥は分析器１に入力されるとＡ／Ｄ変換、スペ
クトル分析処理などが行なわれて、入力された音声信号
が特徴パラメータの系列（音声）くターン）に変換され
る（第３図ステップ１３）。音声区間検出部２では、分
析器１よシ出力された特徴ノくラメータ系列（音声パタ
ーン）のエネルギー情報を利用して音声パターン中の始
端と終端を検出し音声区間を切シ出すものである（第３
図ステップ１４）。この音声区間検出部は音声パターン
の始端、終端を検出した時点でその始端信号、終端信号
を各々タイミング測定部３へ送る。タイミング測定部３
ではこの始端信号を入力した時点を測定すると共に、先
程の音声入力要求信号Ｐを入力した時点から始端信号を
入力した時点までの時間Ｔ、を計算する（第２図ＴＩ＋
第３図ステップ１５）。一方、音声区間検出部２は切シ
出した音声パターン（特徴パラメータ系列）を音声認識
部４へ送る。音声認識部４では入力した音声パターンに
対して予め辞書メモリー５に登録された音声辞書を利用
してその認識を行なうものである（第３図ステップ１６
　’）。この認識は例えば類似度計算法によって行われ
る。この音声認識部４による音声パターンの認識結果は
タイミング測定部３により計算されたＴ１と共に音声応
答制御部６へ送られる。この音声応答制御部６はＴ、の
長さに基いて（第３図ステップ１７　）音声応答鳥を制
御して出力するものであるが、この制御の方法には以下
の３通シがある。First, a voice input request signal P is outputted to a terminal (not shown) from the voice input start request s8 in FIG. 1, but this is also sent to the timing measuring section 3 at the same time (step 11 in FIG. 3).
. The timing measuring section 3 measures the time point when this audio input request signal P is input. On the user side, when the user hears the signal tone ``beep'' which is the voice input request signal, the user inputs the voice ``1'' from the receiver (step 12 in FIG. 3). When this input audio signal is input to the analyzer 1, A/D conversion, spectrum analysis processing, etc. are performed, and the input audio signal is converted into a series (sound) of feature parameters (3rd Figure step 13). The speech section detection section 2 uses the energy information of the feature parameter series (speech pattern) output from the analyzer 1 to detect the beginning and end of the speech pattern and cut out the speech section. (3rd
Figure step 14). When the voice section detection section detects the start and end of the voice pattern, it sends the start and end signals to the timing measurement section 3, respectively. Timing measurement section 3
Now, measure the time point at which this start signal is input, and calculate the time T from the time point at which the voice input request signal P is inputted to the time point at which the start point signal is inputted (see TI+ in Figure 2).
Figure 3 step 15). On the other hand, the speech section detection section 2 sends the cut out speech pattern (characteristic parameter series) to the speech recognition section 4 . The speech recognition unit 4 recognizes the input speech pattern by using a speech dictionary registered in advance in the dictionary memory 5 (step 16 in FIG. 3).
'). This recognition is performed, for example, by a similarity calculation method. The voice pattern recognition result by the voice recognition unit 4 is sent to the voice response control unit 6 together with T1 calculated by the timing measurement unit 3. The voice response control section 6 controls and outputs the voice response bird based on the length of T (step 17 in FIG. 3), and there are three ways of controlling this:

（１）音声応答制御部６はタイミング測定部３よシＴ１
と共に終端信号を入力した時点データを入力する。そし
て音声応答制御部６はＴ１の長さに応じて終端信号の入
力時点から音声応＠鳥を出力する時点までの時間Ｔｌｌ
　（第２図’ｒｓ）の長さを可変制御する。つまＤＴｔ
が予め定められた時間長より短い場合は、ユーザが「ビ
ー」という信号音が聞えるとただちに音声を発声したこ
とになり、ユーザがシステムに熟線しているか又は急い
でいるものと思われる。このため応答音声も早めに端末
へ出力する必要が有シ（第３図ステップ１８）、Ｔ３の
時間長を既定の長さよシ短くする。又、Ｔ１が予め定め
られた時間長より長い場合は、ユーザが「ビー」という
信号音が聞こえた後、かなシ経りてから音声を発声した
ことになシ、ユーザがシステムに慣れていないか又は時
間的に余裕があるものと思われる。このため応答音声も
遅めに端末へ出力する必要が有り（第３図ステップ１９
）、Ｔ３の時間長を既定の長さより長くする。(1) The voice response control section 6 is connected to the timing measurement section 3 by T1.
Also input the data at the time when the termination signal is input. Then, the voice response control unit 6 determines the time Tll from the time of inputting the terminal signal to the time of outputting the voice response @bird according to the length of T1.
The length ('rs in Figure 2) is variably controlled. Tsuma DTt
If the time length is shorter than the predetermined length of time, it means that the user has uttered the voice as soon as he/she hears the "bee" signal sound, and it is assumed that the user is busy with the system or is in a hurry. Therefore, it is necessary to output the response voice to the terminal as soon as possible (step 18 in FIG. 3), so the time length of T3 is made shorter than the predetermined length. Also, if T1 is longer than the predetermined time length, the user may not have uttered the voice after a short delay after hearing the "bee" signal sound, and the user may not be familiar with the system. Or maybe you have time. Therefore, it is necessary to output the response voice to the terminal later (Step 19 in Figure 3).
), the time length of T3 is made longer than the default length.

叩　音声応答制御部６はＴ、の長さに応じて音声応答−
を出力する時間（応答速度）を可変制御する（第２図Ｔ
４）。つまりＴＩが予め定められた時間長より短い場合
は上述の理由によシ応答速度を速くして音声応答鳥を出
力する。Ｔ、が予め定められた時間長より長い場合は上
述の理由により応答速度を遅くして鳥を出力する。この
際に、規則合成方式によって音声応答４が出力される場
合には、規則合成の為の種々のパラメータ（アクセント
。The voice response control unit 6 generates a voice response according to the length of T.
The output time (response speed) is variably controlled (Fig. 2 T
4). In other words, if TI is shorter than a predetermined time length, the response speed is increased and a voice response bird is output for the above-mentioned reason. If T is longer than the predetermined time length, the response speed is slowed down and the bird is output for the reason described above. At this time, when the voice response 4 is output by the rule synthesis method, various parameters (accent, etc.) for rule synthesis are output.

ピッチ等）の速度を制御する。また録音編集方式によっ
てＲ２が出力される場合には、予め録音された発話速度
の異なる単語や音声素片を選択する様にして応答速度を
制御する。control the speed (pitch, etc.). Further, when R2 is output by the recording/editing method, the response speed is controlled by selecting pre-recorded words or speech segments having different speaking speeds.

（ｍ）　音声応答制御部６はＴ１の長さに応じて音声応
答鳥の内容（表現形式）を制御する（第２図鳥）。(m) The voice response control unit 6 controls the content (expression format) of the voice response bird according to the length of T1 (bird in FIG. 2).

例えばユーザが発信音「ビー」が聞えてから「１」と発
声したものとすると、これに対する確認のための音声応
答鳥を出力する場合に、Ｔ１の長さが予め定められた時
間長よりも短い場合には上述した理由によシ「１」とい
り応答を出力する。Ｔ１が予め定められた時間よシ長い
場合には上述した理由によシ「１ですね、分シました。For example, if the user utters "1" after hearing the dial tone "bee", when outputting a voice response bird to confirm this, the length of T1 is longer than the predetermined time length. If it is short, a response of "1" is output for the reason mentioned above. If T1 is longer than the predetermined time, the answer will be ``It's 1 minute.''

」という応答を出力する。つｔｂ音声応答制御部６は入
力された音声パターンの認識結果として「１」を音声認
識部４よシ受は取るが、「１」という確認のための音声
応答の表現形式を変えて出力するものである。” is output. The voice response control unit 6 accepts “1” as the recognition result of the input voice pattern from the voice recognition unit 4, but outputs it by changing the expression format of the voice response for confirmation of “1”. It is something.

ζうして（ｌ　ｌ　（ＩＩ　ｌ　（ｉｉｉ）によってＴ
ｓ　Ｉ　Ｔ４１　Ｒ１の制御方法が決定されると（第３
図ステップ２０）、音声応答出力部７は音声応答制御部
６の指示により音声応答４を出力する（第３図ステップ
２１　）。ζ then (l l (II l (iii) by T
s I T41 Once the control method of R1 is determined (third
(Step 20 in FIG. 3), the voice response output section 7 outputs the voice response 4 according to the instruction from the voice response control section 6 (Step 21 in FIG. 3).

この様に構成された本実施例では、第２図の模式図に示
すように入力要求信号Ｐから音声入力鵬までの時間Ｔ、
に応じて、音声入力−から音声応答鳥までの時間Ｔ３を
変化させたり、音声応答への応答時間Ｔ４を変化させた
シ、音声応答への表現形式を変化させるので、システム
の使用法に慣れているユーザや、急いでいるユーザには
応答までの時間を短くしたシ応答を早口にしたシ、内容
を簡潔にしたり出来、システムの使用法に慣れていない
ユーザや時間的に余裕のあるユーザには、応答までの時
間を長くしたり、応答をゆっくりした口調にしたり、内
容を丁寧にすることが出来る。又、上述した音声応答制
御部による（ｉ　）　（ｉｆ）　（ｉｉｉ）の制御を個
々に行わずに組合せて行なうことも可能である。In this embodiment configured in this way, as shown in the schematic diagram of FIG.
Depending on the situation, the time T3 from the voice input to the voice response bird is changed, the response time T4 to the voice response is changed, and the form of expression for the voice response is changed, so it is easy to become familiar with how to use the system. For users who are in a hurry or who are in a hurry, you can shorten the response time, respond quickly, or make the content concise. You can take longer to respond, respond in a slower tone, and be more detailed. Furthermore, it is also possible to perform the controls in (i), (if), and (iii) in combination by the voice response control section, rather than individually.

こうすることによシ人間と機械との対話の円滑化を図る
ことが出来る。By doing this, it is possible to facilitate dialogue between humans and machines.

次に本発明の第２の実施例について図面を参照して説明
する。第４図は本発明の第２の実施例の概略構成図、第
す図は第２の実施例の処理フロー図である。第２の実施
例は第２図に示されるように入力要求信号から音声入力
開始までの時間Ｔ１と音声人力Ｒ，の発声時間Ｔ２とを
検出して音声応答への出力を制御するようにしたもので
ある。第４図に示す構成は、第１図の構成と比較して、
分析部１、音声区間検出部２、タイミング測定部３、音
声認識部４、辞書メモリ５、音声応答制御部６、音声応
答出力部７、音声入力開始要求部８は同じものであシ、
これらに発話時間測定部９を付加したものである。つま
シ音声区間検出部２は入力された音声パターンの始端、
終端を検出した時点でこれらの始端信号、終端信号を各
々タイミング測定部３へ送ると共に発話時間測定部９へ
も送る。Next, a second embodiment of the present invention will be described with reference to the drawings. FIG. 4 is a schematic configuration diagram of a second embodiment of the present invention, and FIG. 4 is a processing flow diagram of the second embodiment. In the second embodiment, as shown in FIG. 2, the time T1 from the input request signal to the start of voice input and the utterance time T2 of the human voice R are detected to control the output to the voice response. It is something. The configuration shown in FIG. 4 is compared with the configuration shown in FIG.
The analysis section 1, speech section detection section 2, timing measurement section 3, speech recognition section 4, dictionary memory 5, speech response control section 6, speech response output section 7, and speech input start request section 8 are the same;
A speech time measuring section 9 is added to these. The voice section detection unit 2 detects the beginning of the input voice pattern,
When the end is detected, these start end signals and end signals are sent to the timing measurement section 3 and also to the speech time measurement section 9.

発話時間測定部９では始端信号を入力した時点から終端
信号を入力した時点までの時間Ｔ、をめる（第５図ステ
ップ２２）。音声応答制御部６ではタイミング測定部３
によ請求められたＴ１と発話時間測定部９によ請求めら
れたＴ２を入力する。音声応答制御部６ではとのＴ２を
予め定められた時間長と比較しく第５図ステップ２３）
その結果、及び上述したＴ、の時間長の比較結果に応じ
て音声応答の出力を制御する。すなわち発声時間Ｔ２が
予め定められた時間長より短い場合は、ユーザがシステ
ムに慣れているか急いでいる為に早口で発声したものと
見なし上述した様に第２図に示す時間Ｔ３　＋　Ｔ４を
短くしたり音声応答鳥の内容を簡潔なものとする（第５
図ステップ２４）。Ｔ、が予め定められた時間長より長
い場合は、ユーザがシステムに慣れていないか時間的に
余裕がある為にゆつ〈シと遅日で発声したものと見なし
、第２図に示す時間Ｔ、　、　Ｔ。The speech time measurement section 9 measures the time T from the time when the start signal is input to the time when the end signal is input (step 22 in FIG. 5). In the voice response control section 6, the timing measurement section 3
The T1 requested by the user and the T2 requested by the speech time measuring section 9 are input. The voice response control unit 6 compares T2 with a predetermined time length (step 23 in FIG. 5).
The output of the voice response is controlled according to the result and the comparison result of the time length of T mentioned above. In other words, if the utterance time T2 is shorter than the predetermined time length, it is assumed that the user is accustomed to the system or is in a hurry and is therefore uttering quickly, and the time T3 + T4 shown in FIG. 2 is shortened as described above. or keep the content of the voice response bird concise (5th
Figure step 24). If T is longer than the predetermined time length, it is assumed that the user is not familiar with the system or has time to spare, and that the user uttered Yutsu〈shi late in the day. T, , T.

を長くしたり音声応答−の内容を丁寧なものとする（第
５図ステップ２５）。or make the voice response more detailed (step 25 in Figure 5).

この様に第２の実施例によれば、第２図に示す時間ＴＩ
とＴ！を測定しこの結果に対応して音声応答鳥の出力を
制御するので、第１の実施例に比べて更にユーザの性格
や発声の時の情況を良く反映させた応答が可能となる為
に、ユーザと機械の対話の自然性をよシ一層高めること
が出来る。In this manner, according to the second embodiment, the time TI shown in FIG.
and T! Since the output of the voice response bird is controlled in accordance with the result, it is possible to provide a response that better reflects the user's personality and the situation at the time of vocalization, compared to the first embodiment. The naturalness of the interaction between the user and the machine can be further enhanced.

上述した第１．第２の実施例においては、音声入力開始
要求信号Ｐが音声入力開始要求部８より出力されるもの
としたが、これを音声応答出力部１より出力させ、更に
応答音声と入力要求音声を連続して出力させることも出
来る。つまりユーザからの発声と機械からの応答を次々
と連続させて行なうものである（第５図フローの点線）
。第６図は入力要求を含んだ応答音声と入力音声のタイ
ミングを模式化した模式図である。この図において、Ｒ
ｏ、Ｒ−ｉ、Ｒ４，Ｒｅは各入力要求を含んだ応答音声
、”＋　、　’％　、　Ｒｓはユーザからの入力音声で
ある。例えば上述した残高照会サービスにおいて、ＲＯ
ｒ口座番号の数字を１つずつ順に御願いします」Ｔｈ＋
ｒｘ」 −「１ですね。分９ました。次の番号を御願いします」
Ｂｒ２Ｊというものである。この様に応答の出力方法を変形させ
た場合にも、第２の実施例と同様に、応答音声から入力
音声までの時間Ｔ３．Ｔｌｌ、Ｔ０、入力音声の発話時
間Ｔ２　＋　Ｔ６　＋　Ｔ１０を測定することにより、
入力音声から応答音声までの時間Ｔ８　＋　Ｔ７　＋　
Ｔ１１　、応答音声の発話時間Ｔ２　＋　Ｔ６　＋　Ｔ
ｌＧ　＋　ＴＩ２　、応答音声の内容ＲＯ、Ｒｔ　、Ｒ
４，”６を変化させることが出来る。上述した実施例を
この様に変形することにより音声入力と応答がスピーデ
ィに行なわれ、更に回線使用のコストが削減でき、経済
的価値が絶大となる。First mentioned above. In the second embodiment, the voice input start request signal P is output from the voice input start request section 8, but it is outputted from the voice response output section 1, and the response voice and the input request voice are continuously output. You can also output it. In other words, the user's voice and the machine's response are performed one after another (dotted line in the flowchart in Figure 5).
. FIG. 6 is a schematic diagram illustrating the timing of a response voice including an input request and an input voice. In this figure, R
o, R-i, R4, Re are response voices including each input request, ``+, '%, Rs are input voices from the user.For example, in the balance inquiry service mentioned above, RO
Please enter the numbers for your account number one by one." Th+
rx" - "It's 1. It's 9 minutes. Please give me the next number."
It is called Br2J. Even when the response output method is modified in this way, the time T3 from the response voice to the input voice is the same as in the second embodiment. By measuring Tll, T0, and the speaking time of the input voice T2 + T6 + T10,
Time from input voice to response voice T8 + T7 +
T11, response voice utterance time T2 + T6 + T
lG + TI2, response voice content RO, Rt, R
4 and 6 can be changed. By modifying the above-described embodiment in this way, voice input and response can be performed quickly, and the cost of using the line can be further reduced, resulting in tremendous economic value.

同、本発明は上記実施例に限定されるものではない。例
えばタイミング測定部が時間Ｔ１とＴ、の両方を測定し
てもよい。又、入力要求信号から入力音声までの時間の
履歴の惰報、すなわち細口かの時間測定を行なってユー
ザの性格をはつきシと検出できた後に応答出力を変化さ
せてもよい。更に発話時間測定は発話速度測定でもよい
し応答出力として音声だけではな（ＣＲＴ、プリンタ等
を利用して行ってもよい。入力音声の認識処理や音声合
成の方法は従来よシ知られた種々の方式を適宜採用すれ
ばよい。要するに本発明はその要旨を逸脱しない範囲で
種々変形して実施することができる。Similarly, the present invention is not limited to the above embodiments. For example, the timing measuring section may measure both times T1 and T. Alternatively, the response output may be changed after the user's personality can be clearly detected by measuring the history of the time from the input request signal to the input voice, that is, by measuring the time period between the input request signal and the input voice. Furthermore, the speech time may be measured by measuring the speech rate, or may be performed by using not only voice as a response output (CRT, printer, etc.).Input speech recognition processing and speech synthesis methods may be performed using various conventionally known methods. The following method may be adopted as appropriate.In short, the present invention can be implemented with various modifications without departing from the gist thereof.

[Brief explanation of the drawing]

第１図は本発明の第１の実施例の概略構成図、第２図は
入力要求と入力音声及び応答音声のタイミングの模式図
、第３図は第１の実施例の処理フロー図、第４図は本発
明の第２の実施例の概略構成図、第５図は第２の実施例
の処理フロー図、第６図は会話型の連続入力応答形式の
タイミングの模式図である。１　・分析部　２・音声区間検出部３・・タイミング測定部　４・・音声認識部５　辞書メ
モリ　６・・音声応答制御部７・・音声応答出力部　８
．音声入力開始要求部９・・発話時間測定部代理人　弁理士　則　近　憲　佑　（ほか１名）第１図第２図第３図第４図第６図FIG. 1 is a schematic configuration diagram of the first embodiment of the present invention, FIG. 2 is a schematic diagram of the timing of an input request, input voice, and response voice, FIG. 3 is a processing flow diagram of the first embodiment, and FIG. FIG. 4 is a schematic configuration diagram of a second embodiment of the present invention, FIG. 5 is a processing flow diagram of the second embodiment, and FIG. 6 is a schematic diagram of the timing of a conversational continuous input response format. 1.Analysis section 2.Speech section detection section 3..Timing measurement section 4..Speech recognition section 5 Dictionary memory 6..Speech response control section 7..Speech response output section 8
．． Voice input start request section 9...Speech time measurement section Agent Patent attorney Noriyuki Chika (and 1 other person) Fig. 1 Fig. 2 Fig. 3 Fig. 4 Fig. 6

Claims

[Scope of Claims] voice recognition means for recognizing the voice pattern analyzed by the detection means; and voice output means for outputting a voice response device and a voice input request signal based on the recognition result of the voice pattern output from the voice recognition means. , the time from the time when the voice input request signal output from the voice output means is input to the time when the start signal of the voice section signal output from the voice detection means is input; measuring means for measuring either or both of the time from the time when the start signal of the voice section signal is input to the time when the end signal is input; A voice recognition response device that outputs a voice response signal by changing the manner in which the voice response signal is generated based on the following information. 2. The voice recognition response device according to claim 1, wherein the time period is one or more of the times from the time when the end signal of the voice section signal is input to the time when the voice output means outputs the voice response signal.