[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

JPS59216242A - Voice recognizing response device - Google Patents

Voice recognizing response device

Info

Publication number
JPS59216242A
JPS59216242A JP58091809A JP9180983A JPS59216242A JP S59216242 A JPS59216242 A JP S59216242A JP 58091809 A JP58091809 A JP 58091809A JP 9180983 A JP9180983 A JP 9180983A JP S59216242 A JPS59216242 A JP S59216242A
Authority
JP
Japan
Prior art keywords
voice
response
speed
pattern
input
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
JP58091809A
Other languages
Japanese (ja)
Other versions
JPH0721759B2 (en
Inventor
Yoichi Takebayashi
洋一 竹林
Hidenori Shinoda
篠田 英範
Teruhiko Ukita
浮田 輝彦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Toshiba Corp
Original Assignee
Toshiba Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Toshiba Corp filed Critical Toshiba Corp
Priority to JP58091809A priority Critical patent/JPH0721759B2/en
Publication of JPS59216242A publication Critical patent/JPS59216242A/en
Publication of JPH0721759B2 publication Critical patent/JPH0721759B2/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Abstract

PURPOSE:To make a natural and smooth interaction between a person and a machine possible by measuring the speaking speed of an input voice and controlling the output of a response voice in accordance with this speaking speed. CONSTITUTION:With respect to the input voice, the start and the end of a word in a voice pattern are detected by an analyzer 1, a voice pattern memory 2, and a voice section detector 3. A pattern collating circuit 4 collates the voice pattern of word data with standard patterns of plural words registered preliminarily in a word dictionary memory 5 to recognize the word. The recognition results of the input voice obtained by the pattern collating circuit 4 is given to a voice response output part 6 and a speaking speed measurer 7, and the measurer 7 uses the input voice recognition result and the time length between the start and the end to obtain a standard time length. The speaking speed is calculated on a basis of an average value of standard time lengths and variance. A voice response speed controller 9 attains information concerning this speaking speed to control variably the speed of the response voice due to a voice response output part 6.

Description

【発明の詳細な説明】 〔発明の技術分野〕 本発明は音声入力による情報処理システムに用いられる
音声認識応答装置に関する。
DETAILED DESCRIPTION OF THE INVENTION [Technical Field of the Invention] The present invention relates to a voice recognition response device used in an information processing system using voice input.

〔発明の技術的背景とその問題点〕[Technical background of the invention and its problems]

近時、音声認識技術や音声合成技術の発達が目覚ましく
、例えば連続音声認識や不特定話者を対象とした音声認
識が可能となり、また線形予測符号化法を用いた精度の
高い音声合成が可能となっている。また文章を音声に変
換する為の規則合成法に関しても盛んに研究開発されて
いる。
In recent years, there has been remarkable progress in speech recognition technology and speech synthesis technology. For example, it has become possible to perform continuous speech recognition and speech recognition for unspecified speakers, and it has also become possible to perform highly accurate speech synthesis using linear predictive coding. It becomes. There is also active research and development on rule synthesis methods for converting sentences into speech.

しかして、このような技術を用いて、例えば電話公衆回
線を用いて各種のザービスを行う電話音声応答サービス
システムや、銀行等におけるオンライン業務システムの
開発が試行されており、その有用性が注目されている。
Using such technology, attempts have been made to develop, for example, a telephone voice response service system that provides various services using public telephone lines, and an online business system for banks, etc., and their usefulness is attracting attention. ing.

ところがこの種のシステムの利用者は不特定多数であり
、例えば老人や子供等の不慣れな人、あるいは1日に何
回ともなく利用する人が存在する。これにも拘らず、従
来装置f5あっては、その音声応答の内容が一様であシ
、すたその発話速度も一定である為、人間と機械との対
話が円滑になされていなかった。つまり応答が冗長で苛
立しさが生じたり、或いは応答がわかり難いという問題
が生じた。
However, the users of this type of system are an unspecified number of people, including people who are inexperienced, such as the elderly and children, and people who use the system several times a day. In spite of this, in the conventional device f5, the content of the voice response is uniform, and the speech rate is also constant, so that dialogue between humans and machines cannot be carried out smoothly. In other words, a problem arises in that the responses are redundant and irritating, or the responses are difficult to understand.

〔発明の目的〕[Purpose of the invention]

本発明はこのような事情を考慮してなされたもので、そ
の目的とするところは、人間と機械との間の自然で円滑
な対話を可能として効果的な音声入力による情報処理を
可能ならしめる実用性の高い音声認識応答装置を提供す
ることにある。
The present invention was made in consideration of these circumstances, and its purpose is to enable natural and smooth interaction between humans and machines and to enable effective information processing through voice input. An object of the present invention is to provide a highly practical voice recognition response device.

〔発明の植装〕[Invention planting]

本発明は入力音声を認識して音声応答するに際し、上記
入力音声の発話速度を測定し、この発話速度に応じて応
答音声の出力を、例えば応答音声速度や応答内容等を制
御するようにしたものである。
In the present invention, when recognizing an input voice and giving a voice response, the speech rate of the input voice is measured, and the output of the response voice, such as the response voice rate and response content, is controlled according to this speech rate. It is something.

〔発明の効果〕〔Effect of the invention〕

かくして本発明によれば、入力音声の発話速度に応じて
音声応答出力が制御されるので、音声入力者に対して適
切な応答を与えることが可能となる。例えば利用頻度の
高い人に対しては簡潔な応答を与え、また利用頻度の低
い人に対しては丁寧な応答を与えることによって音声入
力の適切な指示を与えることが可能となり、その対話の
自然性、円滑性を十分に高めることが可能となる。
Thus, according to the present invention, since the voice response output is controlled according to the speech rate of the input voice, it is possible to give an appropriate response to the voice input person. For example, it is possible to give appropriate instructions for voice input by giving concise responses to people who use the device frequently, and polite responses to people who use it infrequently, making the dialogue more natural. This makes it possible to sufficiently improve performance and smoothness.

〔発明の実施例〕[Embodiments of the invention]

以下、図面を参照して本発明の実施例につき説明する。 Embodiments of the present invention will be described below with reference to the drawings.

第1図は第1の実施例装置を示す概略構成図である。こ
の装置は音声の認識対象を単語とし、この単語の発話速
度に応じて音声応答の速度制御を行うものである。即ち
、入力音声ば分析器1を介してA/D変換処理、スペク
トル分析処理等が施されてその特徴パラメータの系列に
変換され、音声パターンメモリ2に格納される。
FIG. 1 is a schematic configuration diagram showing a first embodiment of the apparatus. This device uses a word as a speech recognition target, and controls the speed of the speech response according to the speech rate of the word. That is, the input speech is subjected to A/D conversion processing, spectrum analysis processing, etc. via the analyzer 1, converted into a series of characteristic parameters, and stored in the speech pattern memory 2.

音声区間検出器3は、上記特徴パラメータ時系列の、例
えばエネルギー情報を利用して音声パターン中の単語の
始端と終端とを検出するものであり、これによって単語
データ部分が切出される。しかしてt’?ターン照合回
路4は、上記単語データの音声パターンと、単語辞書メ
モリ5に予め登録された複数の単語の各標準パターンと
を照合して、単語を認識している。この/9ターンの照
合は、例えば類似度計算法によって行われる。この認識
結果が音声応答出力部6に与えられる。
The speech section detector 3 detects the start and end of a word in a speech pattern by using, for example, energy information of the characteristic parameter time series, and thereby extracts word data portions. But t'? The turn matching circuit 4 matches the audio pattern of the word data with each standard pattern of a plurality of words registered in advance in the word dictionary memory 5 to recognize the word. This /9 turn matching is performed, for example, by a similarity calculation method. This recognition result is given to the voice response output section 6.

一方、パターン照合回路4で求められた入力音声の認識
結果は発話速度測定器7に与えられる。この発話速度測
定器7ば、入力音声の認識結果Wiと、前記始端および
終端の情報として示される単語の時間長Liとを用い、
単語継続時間長メモリ8に予め登録されている上記認識
単語Wiの標準時間長Riを求め、その平均値と分散と
から発話速度τを算出するものである。これによって例
えば前記入力音声の発話速度τがその平均的な標準発話
速度よりも早いか、或いは遅いかが判定される。換言す
れば、これによって音声入力者が所謂早口か、標準的か
、遅日かが判定される。音声応答速度制御器9は、この
発話速度に関する情報を得て前記音声応答出力部6によ
る応答音声の速度を可変制御するものである。
On the other hand, the recognition result of the input speech obtained by the pattern matching circuit 4 is given to the speech rate measuring device 7. This speech rate measuring device 7 uses the recognition result Wi of the input speech and the time length Li of the word indicated as the start and end information,
The standard duration Ri of the recognized word Wi registered in advance in the word duration memory 8 is obtained, and the speech rate τ is calculated from the average value and variance thereof. This determines, for example, whether the speech rate τ of the input voice is faster or slower than its average standard speech rate. In other words, it is determined whether the voice input person is a so-called fast talker, a standard talker, or a slow talker. The voice response speed controller 9 obtains information regarding the speech rate and variably controls the speed of the response voice output by the voice response output section 6.

この結果、音声応答出力部6からは、入力音声の認識結
果に応じて決定された応答文の音声出力速度が上記入力
音声の発話速度に応じて可変制御されて音声応答がなさ
れることになる。
As a result, the voice response output unit 6 outputs a voice response by variably controlling the voice output speed of the response sentence determined according to the recognition result of the input voice according to the speaking speed of the input voice. .

このとき、規則合成方式によって応答音声が合成出力さ
れる場合にζ−J:、上記規則合成の為の種種のパラメ
ータの変化速度を制御することによって応答音声速度が
可変制御される。また録音編集形の音声合成が行われる
場合には、予め記録された発話速度の異なる文章や音声
素片を選択する等して、その応答音声速度の制御が行わ
れる。
At this time, when the response voice is synthesized and output using the rule synthesis method, the response voice speed is variably controlled by controlling the rate of change of various parameters for the rule synthesis. Furthermore, when voice synthesis in the form of recording and editing is performed, the response voice speed is controlled by selecting pre-recorded sentences or voice segments having different speaking speeds.

かくして、このように構成された本装置によれば、音声
入力者の発話速度に応じた発話速度で音声応答が行われ
るので、所謂せっかちで早口な人に対しては早口形式で
、またのんびり型で遅日な人に対しては緩やかな速度で
音声応答することが可能となシ、ここに人間と機械との
間の対話の自然性を高め、その円滑化を図ることが可能
となる。この結果、総合的には音声認識応答による情報
処理効率の向上を図ることが可能となる。
Thus, according to the present device configured in this way, the voice response is performed at a speaking speed that corresponds to the speaking speed of the person inputting the voice, so it is possible to respond to the so-called impatient and fast-talking person in a fast-talking manner, and to respond to a person who is an impatient person who speaks quickly, or to a leisurely person. This makes it possible to respond by voice at a slower speed to people who are late, thereby increasing the naturalness of dialogue between humans and machines and making it smoother. As a result, it is possible to improve the information processing efficiency through voice recognition responses.

第2図は本発明の汗、2の実施例を示す概略構成図でア
シ、入力音声のぎツチ周波数を求め、その変化からその
発話速度を検出して、応答文の内容自体を変えるように
したものである。即ち、入力音声は分析器11を介して
分析され、その音声・クターンが音声ツクターンメモリ
12に格納される。この音声パターンに対して音声認識
部13は辞書メモリ14に登録された音声辞書を参照し
てその認識を行っている。
Figure 2 is a schematic configuration diagram showing the second embodiment of the present invention, in which the pitch frequency of the input voice is determined, the speaking rate is detected from the change, and the content of the response sentence itself is changed. This is what I did. That is, the input speech is analyzed through the analyzer 11, and the speech pattern is stored in the speech pattern memory 12. The speech recognition unit 13 recognizes this speech pattern by referring to a speech dictionary registered in the dictionary memory 14.

一方、前記入力音声はピッチ抽出器15を介してそのピ
ッチ周波数成分が検出されている。
On the other hand, the pitch frequency component of the input voice is detected through the pitch extractor 15.

このピッチ周波数成分の検出は、例えばケプストラム法
、変形相関法、ADIVI法等を用いて、前記入力音声
の認識処理とは独立に行われる。このピッチ周波数の時
系列パターンの変化から、発話速度測定器16により上
記入力音声の発話速度τが求められている。この発話速
度嘗の情報と前記入力音声の認識結果とを入力して音声
応答制御部17はそれに応じた内容の応答文を決定して
おシ、これが音声応答出力部18を介して音声出力され
る。
Detection of this pitch frequency component is performed independently of the recognition process of the input speech using, for example, the cepstral method, the modified correlation method, the ADIVI method, or the like. The speech rate τ of the input voice is determined by the speech rate measuring device 16 from the change in the time-series pattern of the pitch frequency. The voice response control unit 17 inputs this speech rate information and the recognition result of the input voice, and determines a response sentence with corresponding content, which is output as voice via the voice response output unit 18. Ru.

つまシ音声認識結果とその発話速度に応じて、例えば「
ありがとう」、「あシがとうございます」、「あシがと
うございました、またどうぞ」等の同一意味を表わす応
答であっても、その表現形式の異なるものの中の1つが
選択制御されて音声出力される。つまり、音声入力者に
応じた内容と速度の音声応答がなされることになる。
Depending on the speech recognition result and the speaking speed, for example,
Even if the responses express the same meaning, such as "Thank you,""Thankyou,""Thank you, thank you," or "Thank you, thank you very much." Output. In other words, a voice response is made with the content and speed depending on the voice input person.

従って、音声応答として次の音声入力指示を与えるよう
な場合、簡洲にその指示を与えた9、不慣れな人に対し
ては丁寧にその指示をヵえたりすることが可能となり、
対話の自然性を高め、処理効率の向上を図ることが可能
となる。
Therefore, when giving the next voice input instruction as a voice response, it becomes possible to politely repeat the instruction to an unfamiliar person by giving that instruction to Kanshu9.
It becomes possible to enhance the naturalness of dialogue and improve processing efficiency.

このように本発明によれば、音声入力者の性格を良く反
映する音声発話速度を検出し、これに応じて音声応答を
制御するので、音声入力者との間の対話の自然性を高め
ることができる。
As described above, according to the present invention, since the speech rate that well reflects the personality of the person inputting voice is detected and the voice response is controlled accordingly, the naturalness of the dialogue with the person inputting voice can be enhanced. I can do it.

この結果、上記音声入力者に苛立たしさを与える等の不
具合が無くなる等の実用上多大なる効果が奏せられる。
As a result, great practical effects can be achieved, such as eliminating problems such as irritating the voice inputter.

尚、本発明は上記実施例に限宇されるものではない。例
えば入力音声の発話法n′[の測定を、母音類似度やス
ペクトルの時間的変化を利用して行ったり、数字の連続
発生を認識対象とする時にはそのポーズ時間長を用いて
行うようにすることもできる。その他、音声応答の制御
を、その内容文の制御と共に発話速度をも制御して行う
ようにしてもよい。また音声の認識処理や音声合成の方
式は、従来よシ知られ/ζ種々の方式を適宜採用すれば
よい。要するに本発明は、その要旨を逸脱しない範囲で
稠々変形して実施することができる。
Note that the present invention is not limited to the above embodiments. For example, the utterance method n'[ of the input speech can be measured using vowel similarity or temporal changes in the spectrum, or when the continuous occurrence of numbers is to be recognized, the pause duration can be used. You can also do that. In addition, the voice response may be controlled by controlling the content sentence as well as the speech rate. Furthermore, various conventionally known methods for speech recognition processing and speech synthesis may be employed as appropriate. In short, the present invention can be modified and implemented without departing from the gist thereof.

【図面の簡単な説明】[Brief explanation of the drawing]

第1図は本発明の第1の実施例装置の概1略第1が成図
、第2図は本発明の第2の実施例装置の概略構成図であ
る。 1.11・・・分析器、2.12・・・階1わパターン
メモリ、3・・・音声区間検出器、4・・・ツヤターン
照合回路、5・・・単語音材メモリ、6,18・・・音
声応答出力部、7,16・・・発話速度測定器、8・・
・単語継続時間長メモリ、9・・・音声応答速度!制御
器、13・・・音声認識部、14・・・辞書メモ1)、
15・・・ピッチ抽出器、17・・・音声応答制御部。
FIG. 1 is a schematic diagram of a device according to a first embodiment of the present invention, and FIG. 2 is a schematic diagram of a device according to a second embodiment of the present invention. 1.11...Analyzer, 2.12...Level 1 pattern memory, 3...Speech section detector, 4...Tsuya turn matching circuit, 5...Word sound material memory, 6,18 ...Voice response output unit, 7, 16...Speech rate measuring device, 8...
・Word duration length memory, 9...Voice response speed! Controller, 13... Voice recognition unit, 14... Dictionary memo 1),
15... Pitch extractor, 17... Voice response control unit.

Claims (2)

【特許請求の範囲】[Claims] (1)入力音声を認識し、この認識結果に対する応答を
音声出力する音声認識応答装置において、上記入力音声
の発話速度を測定し、この発話速度に応じて音声応答出
力を制御してなることを特徴とする音声認識応答装置。
(1) In a voice recognition response device that recognizes input voice and outputs a voice response to the recognition result, the speech rate of the input voice is measured and the voice response output is controlled according to this speech rate. Characteristic voice recognition response device.
(2)音声応答出力の制御は、音声応答速度を可変し、
或いは音声応答文の内容を変化させて行われるものであ
る特許請求の範囲第1項記載の音声認識応答装置。
(2) Control of the voice response output varies the voice response speed,
The voice recognition response device according to claim 1, wherein the voice recognition response device is configured to change the content of the voice response sentence.
JP58091809A 1983-05-25 1983-05-25 Speech recognition response device Expired - Lifetime JPH0721759B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP58091809A JPH0721759B2 (en) 1983-05-25 1983-05-25 Speech recognition response device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP58091809A JPH0721759B2 (en) 1983-05-25 1983-05-25 Speech recognition response device

Related Child Applications (1)

Application Number Title Priority Date Filing Date
JP4139390A Division JPH0731508B2 (en) 1992-05-29 1992-05-29 Speech recognition response device

Publications (2)

Publication Number Publication Date
JPS59216242A true JPS59216242A (en) 1984-12-06
JPH0721759B2 JPH0721759B2 (en) 1995-03-08

Family

ID=14036950

Family Applications (1)

Application Number Title Priority Date Filing Date
JP58091809A Expired - Lifetime JPH0721759B2 (en) 1983-05-25 1983-05-25 Speech recognition response device

Country Status (1)

Country Link
JP (1) JPH0721759B2 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS62145294A (en) * 1985-12-20 1987-06-29 株式会社東芝 Voice notice unit
JPH01169660A (en) * 1987-12-25 1989-07-04 Toshiba Corp Pattern generating device
EP1081683A1 (en) * 1999-08-30 2001-03-07 Philips Corporate Intellectual Property GmbH Speech recognition method and device
US8364475B2 (en) 2008-12-09 2013-01-29 Fujitsu Limited Voice processing apparatus and voice processing method for changing accoustic feature quantity of received voice signal
US10157607B2 (en) 2016-10-20 2018-12-18 International Business Machines Corporation Real time speech output speed adjustment

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008026463A (en) * 2006-07-19 2008-02-07 Denso Corp Voice interaction apparatus
JP2012128440A (en) * 2012-02-06 2012-07-05 Denso Corp Voice interactive device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS5757375A (en) * 1981-08-06 1982-04-06 Noriko Ikegami Electronic translator
JPS59153238A (en) * 1983-02-21 1984-09-01 Nec Corp Voice input/output system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS5757375A (en) * 1981-08-06 1982-04-06 Noriko Ikegami Electronic translator
JPS59153238A (en) * 1983-02-21 1984-09-01 Nec Corp Voice input/output system

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS62145294A (en) * 1985-12-20 1987-06-29 株式会社東芝 Voice notice unit
JPH0631998B2 (en) * 1985-12-20 1994-04-27 株式会社東芝 Voice notification device
JPH01169660A (en) * 1987-12-25 1989-07-04 Toshiba Corp Pattern generating device
EP1081683A1 (en) * 1999-08-30 2001-03-07 Philips Corporate Intellectual Property GmbH Speech recognition method and device
US6629072B1 (en) 1999-08-30 2003-09-30 Koninklijke Philips Electronics N.V. Method of an arrangement for speech recognition with speech velocity adaptation
US8364475B2 (en) 2008-12-09 2013-01-29 Fujitsu Limited Voice processing apparatus and voice processing method for changing accoustic feature quantity of received voice signal
US10157607B2 (en) 2016-10-20 2018-12-18 International Business Machines Corporation Real time speech output speed adjustment

Also Published As

Publication number Publication date
JPH0721759B2 (en) 1995-03-08

Similar Documents

Publication Publication Date Title
Deshwal et al. Feature extraction methods in language identification: a survey
US20240038214A1 (en) Attention-Based Clockwork Hierarchical Variational Encoder
CN112489629B (en) Voice transcription model, method, medium and electronic equipment
US11475874B2 (en) Generating diverse and natural text-to-speech samples
JP2815579B2 (en) Word candidate reduction device in speech recognition
EP0535146A4 (en)
DE112021000959T5 (en) Synthetic Language Processing
WO2023279976A1 (en) Speech synthesis method, apparatus, device, and storage medium
WO2021118793A1 (en) Speech processing
Kumar et al. Machine learning based speech emotions recognition system
Nose et al. HMM-based expressive singing voice synthesis with singing style control and robust pitch modeling
Wu et al. Multilingual text-to-speech training using cross language voice conversion and self-supervised learning of speech representations
Nedjah et al. Automatic speech recognition of Portuguese phonemes using neural networks ensemble
Prasanna et al. Comparative deep network analysis of speech emotion recognition models using data augmentation
JPS59216242A (en) Voice recognizing response device
Rusan et al. Human-Computer Interaction Through Voice Commands Recognition
Kwon et al. Effective parameter estimation methods for an excitnet model in generative text-to-speech systems
JP5300000B2 (en) Articulation feature extraction device, articulation feature extraction method, and articulation feature extraction program
Chen et al. A new learning scheme of emotion recognition from speech by using mean fourier parameters
Bansal et al. Automatic speech recognition by cuckoo search optimization based artificial neural network classifier
Galajit et al. ThaiSpoof: A Database for Spoof Detection in Thai Language
Heo et al. Classification based on speech rhythm via a temporal alignment of spoken sentences
Alastalo Finnish end-to-end speech synthesis with Tacotron 2 and WaveNet
Khan et al. detection of questions in Arabic audio monologues using prosodic features
Kondhalkar et al. Speech recognition using novel diatonic frequency cepstral coefficients and hybrid neuro fuzzy classifier