JP5046589B2

JP5046589B2 - Telephone system, call assistance method and program

Info

Publication number: JP5046589B2
Application number: JP2006240473A
Authority: JP
Inventors: 剛加藤
Original assignee: NEC Communication Systems Ltd
Current assignee: NEC Communication Systems Ltd
Priority date: 2006-09-05
Filing date: 2006-09-05
Publication date: 2012-10-10
Anticipated expiration: 2026-09-05
Also published as: JP2008066866A

Abstract

<P>PROBLEM TO BE SOLVED: To provide a system, method and program for enabling a hearing-impaired person or the like to correctly understand uttered contents on the telephone of communicating party and smoothly have a conversation on the telephone. <P>SOLUTION: The telephone system comprises a call control part 1 connected to a line for controlling the connection of a call, and a service center 4 connected to the call control part and provided with a voice recognition part 2 and an image preparation part 3. When the call from a caller is connected through the call connection part to the voice recognition part of the service center, the voice recognition part voice-recognizes the speech contents of a subscriber, converts the voice recognized result to character information, adds the reading information of the voice recognized result, and delivers it through the call control part to the image preparation part. The image preparation part prepares image data for which the voice recognized result and the reading information are combined and delivers them to the call control part. The call control part transmits them to the terminal of the subscriber on a call termination side, and the voice recognized result of the speech contents and the reading information are displayed on a screen at the terminal of the subscriber on the call termination side. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、通話内容を音声認識して端末に表示する電話システムと通話補助方法とプログラムに関する。 The present invention relates to a telephone system, a call assistance method, and a program for recognizing a call content and displaying it on a terminal.

通話内容を音声認識して端末に表示するこの種の電話システムは、例えば聴覚障害者向けの電話補助システムに利用される。その典型的な基本的な構成は、発呼者の発声内容を音声認識装置で音声認識させ、文字表示装置にて表示するというものであり、従来のシステムは、２つのタイプに大別される。 This type of telephone system that recognizes the content of a call and displays it on a terminal is used, for example, in a telephone assistance system for a hearing impaired person. The typical basic configuration is that the content of a caller's utterance is recognized by a voice recognition device and displayed on a character display device. Conventional systems are roughly classified into two types. .

１つは、専用の端末内に音声認識機能を具備し、回線からの発呼者の音声を音声認識し、専用端末内の表示装置に表示するものである（「端末型」という）。 One is that a dedicated terminal has a voice recognition function, recognizes the voice of the caller from the line, and displays it on a display device in the dedicated terminal (referred to as “terminal type”).

他は、センタ側に、音声認識装置を備え、文字表示機能付き端末に、音声と文字情報（テキスト情報）をあわせて送信し、文字情報機能付き端末で表示するものである（「センタ型」という）。 The other is that a voice recognition device is provided on the center side, and voice and character information (text information) are transmitted to a terminal with a character display function and displayed on the terminal with a character information function (“center type”). Called).

なお、特許文献１には、通話者の音声を通信システムを介して音声信号受信手段で受信し、音声／文字変換スイッチで字幕受信を選択したとき、音声認識装置により受信音声信号を認識し、認識音声を文字データに変換してディスプレイ上に字幕で表示し、逆に通話者が入力した文字列を文章として音声合成し、通話相手に送信する手段を備えたマルチメディア公衆電話システムが開示されている。また、特許文献２には、携帯電話機のマイクから入力された音声はデジタル音声データに変換され、コンピュータの音声認識部に供給されて音声認識され、音声認識結果が、携帯電話機に返送され、携帯電話機の文字編集部にて編集された後、ディスプレイに表示され、認識結果の修正コマンドを携帯電話機のボタン操作部から入力すると、インターネット等の通信回線を介して、コンピュータに送信され、修正結果が返送され携帯電話機のディスプレイに表示され、かな漢字変換を指示すると、コンピュータ（センタ）の文字編集部で変換され、変換結果が返送されディスプレイに表示される構成の文字入力装置が開示されている。また特許文献３には、音声入力された文章を編集する際にカーソルの指定を簡便に行うことのできる音声タイプライタとして、音声認識された日本語テキスト表示領域と入力テキスト表示領域を備えた構成が開示されている。 In Patent Document 1, when a caller's voice is received by a voice signal receiving means via a communication system and subtitle reception is selected by a voice / character conversion switch, the voice recognition device recognizes the received voice signal, Disclosed is a multimedia public telephone system that includes means for converting recognized speech into character data, displaying it as subtitles on the display, and conversely synthesizing the character string input by the caller as text and sending it to the other party. ing. In Patent Document 2, voice input from a microphone of a mobile phone is converted into digital voice data, supplied to a voice recognition unit of a computer for voice recognition, and a voice recognition result is returned to the mobile phone. After editing by the character editing unit of the telephone, it is displayed on the display, and when a correction command for the recognition result is input from the button operation unit of the mobile phone, the correction result is transmitted to the computer via a communication line such as the Internet. A character input device is disclosed that is returned and displayed on a display of a mobile phone and converted by a character editing unit of a computer (center) when conversion to kana-kanji is instructed, and the conversion result is returned and displayed on the display. Japanese Patent Application Laid-Open No. 2004-228561 includes a voice-recognized Japanese text display area and an input text display area as a voice typewriter that can easily specify a cursor when editing a voice-input sentence. Is disclosed.

特開平１０−２２４５２０号公報Japanese Patent Laid-Open No. 10-224520 特開２００３−３０４３３１号公報JP 2003-304331 A 特公平６−１０３４５７号公報Japanese Examined Patent Publication No. 6-103457

しかしながら、これら従来のシステムは、実際の運用上、以下のような課題がある。 However, these conventional systems have the following problems in actual operation.

電話で不特定多数者によって自由に話される会話は、音声認識においても、認識がかなり難しい部類に属する処理である。また、電話での音声認識においては、回線網を通った音声の品質が、マイクを入力した音声の品質と比べて、著しく劣る。これは、電話回線網での伝送による、符号化の劣化、回線エコー、雑音除去の難しさ等が挙げられる。 A conversation freely spoken by an unspecified majority on the telephone is a process belonging to a class that is quite difficult to recognize even in speech recognition. In voice recognition on a telephone, the quality of voice passing through a network is significantly inferior to the quality of voice inputted through a microphone. This is because of deterioration in encoding, line echo, difficulty in noise removal, and the like due to transmission over a telephone line network.

誤認識された音声認識結果が、さらに、仮名漢字混じりの文章に変換されると、変換誤り等のノイズが加わり、さらに、元の発話内容の推定は困難となる。 If the misrecognized speech recognition result is further converted into a kana-kanji mixed sentence, noise such as a conversion error is added, and it is difficult to estimate the original utterance content.

従来より、電話音声を音声認識し文字表示する機能を備えた聴覚障害者向けの、音声認識装置、及び該装置を備えた電話システムは、各種提案されているものの、本格的な実用化には至っていない。これは、一般に電話の発話内容を、逐次、音声認識することは難しく、誤認識による性能劣化が予想されているためである。 Conventionally, although various voice recognition devices for a hearing impaired person and a telephone system including the device for a hearing-impaired person having a function of recognizing phone speech and displaying characters are proposed, Not reached. This is because, generally speaking, it is difficult to sequentially recognize speech content of a telephone call and performance degradation due to erroneous recognition is expected.

聴覚障害者にとっては、電話の発話内容の聞き取りは、切実な問題であり、早期の解決及び実用化が望まれている。 For hearing-impaired persons, listening to the content of telephone utterances is an urgent problem, and an early solution and practical application is desired.

したがって、本発明の目的は、聴覚障害者等に、通話相手の電話の発話内容の正確な理解を可能たらしめ、これにより、円滑な通話を可能とするシステム、方法、プログラムを提供することにある。 Therefore, an object of the present invention is to provide a system, a method, and a program that enable a hearing impaired person or the like to accurately understand the utterance contents of the other party's telephone, thereby enabling a smooth call. is there.

本願で開示される発明は、前記課題を解決するため、概略以下の構成とされる。 In order to solve the above-described problems, the invention disclosed in the present application is generally configured as follows.

本発明の１つの側面（アスペクト）に係るシステムは、第１の端末から入力された音声信号を音声認識する手段と、前記音声認識結果の読み情報を生成する手段と、少なくとも前記読み情報を、前記第１の端末の通話相手である第２の端末に表示させる手段と、を備えている。 A system according to one aspect of the present invention includes a unit that recognizes a voice signal input from a first terminal, a unit that generates reading information of the voice recognition result, and at least the reading information. Means for displaying on a second terminal which is a call partner of the first terminal.

本発明に係るシステムにおいて、音声認識結果の文字情報と、前記読み情報を含む画面データを作成する手段を備え、前記画像データが前記第２の端末に送信される。 The system according to the present invention includes means for creating screen data including character information of a speech recognition result and the reading information, and the image data is transmitted to the second terminal.

本発明に係るシステムにおいて、前記音声認識する手段、前記読み情報を生成する手段と、前記画面データを作成する手段を、呼接続部を介して回線に接続するサービスセンタに備えている。 In the system according to the present invention, the voice recognition unit, the reading information generation unit, and the screen data generation unit are provided in a service center connected to a line via a call connection unit.

本発明に係るシステムにおいて、前記音声認識する手段と前記読み情報を生成する手段とを含む音声認識部を備え、前記音声認識部は、前記第１の端末からの電話音声を受け取って音声認識し、音声認識結果を文字情報（テキスト）に変換し、さらに前記音声認識結果の読み情報を生成し、テキストに変換された認識結果と、前記読み情報を含む画面データを作成する画像データ作成部を備え、前記画面データが前記第２の端末に送信され、テキストに変換された認識結果と前記読み情報を含む画面が、前記第２の端末に表示される。 The system according to the present invention further includes a voice recognition unit including the voice recognition unit and the reading information generation unit, and the voice recognition unit receives the phone voice from the first terminal and recognizes the voice. A speech recognition result is converted into character information (text), reading information of the speech recognition result is further generated, and an image data creation unit for creating the recognition result converted into text and screen data including the reading information is provided. The screen data is transmitted to the second terminal, and a screen including the recognition result converted into text and the reading information is displayed on the second terminal.

本発明に係るシステムにおいて、前記第２の端末が、前記音声認識結果の読み情報を生成する手段を備えている構成としてもよい。 In the system according to the present invention, the second terminal may include a unit that generates reading information of the voice recognition result.

本発明に係るシステムは、回線に接続され、呼の接続制御を行う呼制御部と、呼制御部に接続され、音声認識部と、画面作成部とを備えたサービスセンタと、を備え、発呼者からの呼が、前記呼制御部を経由し、前記サービスセンタの前記音声認識部に接続されると、前記音声認識部は、発呼者からの通話内容を音声認識し、音声認識結果を文字情報に変換するとともに、音声認識結果の読み情報を付加し、前記呼制御部を介して前記画面作成部に渡し、前記画面作成部は、音声認識結果とその読み情報をあわせた画面データを作成して前記呼制御部に渡し、前記呼制御部は、着呼側の加入者の端末に送信し、前記着呼側の加入者の端末では、通話内容の音声認識結果と読み情報とが画面表示される。 A system according to the present invention includes a call control unit that is connected to a line and performs call connection control, and a service center that is connected to the call control unit and includes a voice recognition unit and a screen creation unit. When a call from a caller is connected to the voice recognition unit of the service center via the call control unit, the voice recognition unit recognizes the content of the call from the caller and performs a voice recognition result. Is converted into character information, and reading information of a speech recognition result is added and passed to the screen creation unit via the call control unit. The screen creation unit is a screen data combining the speech recognition result and the reading information. Is generated and passed to the call control unit, and the call control unit transmits the call to the subscriber's terminal. Is displayed on the screen.

本発明に係るシステムにおいて、前記サービスセンタが、前記音声認識部とは別に読みつけ生成部を備え、呼が接続されたあと、前記サービスセンタの前記音声認識部では、発呼者からの通話内容を音声認識し、音声認識結果を、前記呼制御部に渡し、前記呼制御部は、音声認識結果を、前記サービスセンタの前記読みつけ生成部に送り、前記読みつけ生成部は、前記音声認識結果から読みを推定して読み情報を生成し、前記読み情報を、前記呼制御部に送信し、前記呼制御部は、前記サービスセンタの前記音声認識部から出力される認識結果の文字情報と、前記サービスセンタの前記読みつけ生成部からの読み情報を、前記サービスセンタの前記画面作成部に送り、前記画面作成部は、音声認識結果とその読み情報をあわせた画面データを作成する。 In the system according to the present invention, the service center includes a reading generation unit separately from the voice recognition unit, and after the call is connected, the voice recognition unit of the service center includes a call content from a caller. The speech recognition result is transferred to the call control unit, the call control unit sends the speech recognition result to the reading generation unit of the service center, and the reading generation unit Reading is estimated from the result, reading information is generated, and the reading information is transmitted to the call control unit. The call control unit includes character information of a recognition result output from the voice recognition unit of the service center; , The reading information from the reading generation unit of the service center is sent to the screen creation unit of the service center, and the screen creation unit obtains screen data combining the voice recognition result and the reading information. It is formed.

本発明に係るシステムにおいて、前記音声認識部とは別に読みつけ生成部を前記着呼側の端末に備え、前記サービスセンタの前記音声認識部では、読み情報を生成せず、前記画面作成部は、音声認識結果の画面データを作成し、前記着呼側の端末の読みつけ生成部が、前記音声認識結果から読みを推定して読み情報を生成する、ようにしてもよい。 In the system according to the present invention, a reading generation unit is provided in the terminal on the called side separately from the voice recognition unit, the voice recognition unit of the service center does not generate reading information, and the screen generation unit The screen data of the voice recognition result may be generated, and the reading generation unit of the terminal on the called side may generate reading information by estimating the reading from the voice recognition result.

本発明において、前記着呼側の端末からの音声は、発呼側の端末に送信され、発呼側の端末で受信され再生される。 In the present invention, the voice from the called terminal is transmitted to the calling terminal, and is received and reproduced by the calling terminal.

本発明の他のアスペクトに係るサービスセンタは、呼制御部に接続される音声認識部と、画面作成部を備えたサービスセンタであって、発呼者からの呼が、前記呼制御部を経由して前記音声認識部に接続されると、前記音声認識部は、発呼者からの通話内容を音声認識し、音声認識結果を文字情報に変換するとともに、音声認識結果の読み情報を付加し、前記呼制御部を介して前記画面作成部に渡し、前記画面作成部は、音声認識結果とその読み情報をあわせた画面データを作成して、前記呼制御部に渡し、前記呼制御部を介して、着呼側の加入者の端末に送信する。 A service center according to another aspect of the present invention is a service center including a voice recognition unit connected to a call control unit and a screen creation unit, and a call from a caller passes through the call control unit. When connected to the voice recognition unit, the voice recognition unit recognizes the content of the call from the caller, converts the voice recognition result into character information, and adds reading information of the voice recognition result. The screen creation unit passes the call control unit to the screen creation unit, and the screen creation unit creates screen data that combines the speech recognition result and the reading information, and passes it to the call control unit. To the terminal of the called subscriber.

本発明に係るサービスセンタにおいて、読みつけ生成部をさらに備え、呼が接続されたあと、前記音声認識部は、発呼者からの通話内容を音声認識し、音声認識結果を、前記呼制御部を介して前記読みつけ生成部に送り、前記読みつけ生成部は、音声認識結果から読みを推定し、読み情報を、前記呼制御部に送信し、前記呼制御部は、サービスセンタの音声認識部からの出力結果と、読みつけ生成部からの読み情報を、前記サービスセンタの画面作成部に送り、前記画面作成部は、音声認識結果とその読み情報をあわせた画面データを作成する、ようにしてもよい。 The service center according to the present invention further includes a reading generation unit, and after the call is connected, the voice recognition unit recognizes the content of the call from the caller and uses the voice recognition result as the call control unit. The reading generation unit estimates the reading from the speech recognition result, transmits the reading information to the call control unit, and the call control unit receives the voice recognition of the service center. The output result from the section and the reading information from the reading generation section are sent to the screen creation section of the service center, and the screen creation section creates screen data that combines the voice recognition result and the reading information. It may be.

本発明において、前記文字情報は、前記音声認識結果を仮名漢字変換したものであり、前記読み情報は、前記音声認識結果からその読みを推定したものを、平仮名、ローマ字、発音記号のうちの少なくとも１つで表記したものである。 In the present invention, the character information is obtained by converting the voice recognition result to Kana-Kanji, and the reading information is obtained by estimating the reading from the voice recognition result, and includes at least one of Hiragana, Roman characters, and phonetic symbols. It is written in one.

本発明のさらに他のアスペクトの方法によれば、
第１の端末から入力された音声信号を音声認識する工程と、
音声認識結果の読み情報を生成する工程と、
少なくとも前記読み情報を、前記第１の端末の通話相手である第２の端末に表示させる工程と、
を含む。 According to yet another aspect of the present invention,
Recognizing a voice signal input from the first terminal;
Generating reading information of the speech recognition result;
Displaying at least the reading information on a second terminal that is a call partner of the first terminal;
including.

本発明に係る方法において、音声認識結果の文字情報と、前記読み情報を含む画面データを作成する工程を含み、前記画像データが前記第２の端末に送信される。 In the method according to the present invention, the image data is transmitted to the second terminal, including the step of creating screen data including character information of a speech recognition result and the reading information.

本発明に係る方法において、前記第２の端末が、前記音声認識結果の読み情報を生成する、ようにしてもよい。 In the method according to the present invention, the second terminal may generate reading information of the voice recognition result.

本発明に係る方法は、回線に接続され、呼の接続制御を行う呼制御部と、前記呼制御部に接続され、音声認識部と画面作成部を備えたサービスセンタと、を備えた電話システムの通話補助方法であって、発呼者からの呼が、前記呼制御部を経由し、前記サービスセンタの前記音声認識部に接続されると、前記音声認識部は、発呼者からの通話内容を音声認識し、音声認識結果を文字情報に変換するとともに、音声認識結果の読み情報を付加し、前記呼制御部を介して前記画面作成部に渡し、
前記画面作成部は、音声認識結果とその読み情報をあわせた画面データを作成して前記呼制御部に渡し、前記呼制御部は、着呼側の加入者の端末に送信し、
前記着呼側の加入者の端末では、通話内容の音声認識結果と読み情報が画面表示される。 The method according to the present invention includes a call control unit that is connected to a line and performs call connection control, and a service center that is connected to the call control unit and includes a voice recognition unit and a screen creation unit. The call recognition method of claim 1, wherein when a call from a caller is connected to the voice recognition unit of the service center via the call control unit, the voice recognition unit Recognizes the content, converts the speech recognition result into character information, adds reading information of the speech recognition result, passes to the screen creation unit via the call control unit,
The screen creation unit creates screen data combining the voice recognition result and the reading information and passes it to the call control unit, and the call control unit transmits to the terminal of the called subscriber,
At the called subscriber's terminal, the voice recognition result and the reading information of the call contents are displayed on the screen.

本発明に係る方法おいて、前記サービスセンタが、前記音声認識部とは別に読みつけ生成部を備え、呼が接続されたあと、前記サービスセンタの前記音声認識部は、発呼者からの通話内容を音声認識し、音声認識結果を、前記呼制御部に渡し、前記呼制御部は、音声認識結果を前記サービスセンタの前記読みつけ生成部に送り、前記読みつけ生成部は、音声認識結果から読みを推定し、読み情報を、前記呼制御部に送信し、前記呼制御部は、サービスセンタの音声認識部から出力される認識結果の文字情報と、前記読みつけ生成部からの読み情報を、前記サービスセンタの画面作成部に送り、前記画面作成部は、認識結果とその読み情報をあわせた画面データを作成する、ようにしてもよい。 In the method according to the present invention, the service center includes a reading generation unit separately from the voice recognition unit, and after a call is connected, the voice recognition unit of the service center receives a call from a caller. Recognizes the content and passes the speech recognition result to the call control unit, the call control unit sends the speech recognition result to the reading generation unit of the service center, and the reading generation unit Reading is estimated, and the reading information is transmitted to the call control unit. The call control unit receives the character information of the recognition result output from the voice recognition unit of the service center and the reading information from the reading generation unit. May be sent to the screen creation unit of the service center, and the screen creation unit may create screen data that combines the recognition result and the reading information.

本発明に係るプログラムは、
第１の端末から入力された音声信号を音声認識する処理と、
音声認識結果の読み情報を生成する処理と、
少なくとも前記読み情報を、前記第１の端末の通話相手である第２の端末に表示させる処理と、をコンピュータに実行させるプログラムよりなる。 The program according to the present invention is:
Processing for recognizing a voice signal input from the first terminal;
Processing to generate reading information of speech recognition results;
It comprises a program for causing a computer to execute at least the process of displaying the reading information on a second terminal that is a call partner of the first terminal.

本発明に係るプログラムにおいて、音声認識結果の文字情報と、前記読み情報を含む画面データを作成する処理を前記コンピュータに実行させるプログラムよりなる。 The program according to the present invention includes a program for causing the computer to execute processing for creating character data of a speech recognition result and screen data including the reading information.

本発明に係るプログラムは、呼制御部に接続される音声認識部と、画面作成部を備え、発呼者からの呼が、前記呼制御部を経由して前記音声認識部に接続されると、前記音声認識部は、発呼者からの通話内容を音声認識し、音声認識結果を文字情報に変換するとともに、音声認識結果の読み情報を付加し前記呼制御部を介して前記画面作成部に渡し、
前記画面作成部は、音声認識結果とその読み情報をあわせた画面データを作成して、前記呼制御部に渡し、前記呼制御部を介して、着呼側の加入者の端末に送信する、サービスセンタを構成するコンピュータに、前記音声認識部と前記画面作成部の処理を実行させるプログラムよりなる。 A program according to the present invention includes a voice recognition unit connected to a call control unit and a screen creation unit, and when a call from a caller is connected to the voice recognition unit via the call control unit. The voice recognition unit recognizes the content of the call from the caller, converts the voice recognition result into character information, and adds reading information of the voice recognition result to the screen creation unit via the call control unit. To
The screen creation unit creates screen data that combines the voice recognition result and its reading information, passes it to the call control unit, and transmits it to the terminal of the called subscriber via the call control unit. The program comprises a program that causes a computer constituting a service center to execute the processing of the voice recognition unit and the screen creation unit.

本発明に係るプログラムおいて、前記サービスセンタが前記音声認識部とは別に読みつけ生成部を備え、呼が接続されたあと、前記音声認識部は、発呼者からの通話内容を音声認識し、音声認識結果を文字情報に変換し、前記呼制御部を介して前記読みつけ生成部に送り、前記読みつけ生成部は、音声認識結果から読みを推定し、読み情報を、前記呼制御部に送信する前記サービスセンタを構成するコンピュータに、前記音声認識部と前記画面作成部と前記読みつけ生成部の処理を実行させるプログラムよりなる。 In the program according to the present invention, the service center includes a reading generation unit separately from the voice recognition unit, and after the call is connected, the voice recognition unit recognizes the content of the call from the caller. The voice recognition result is converted into character information and sent to the reading generation unit via the call control unit. The reading generation unit estimates a reading from the voice recognition result, and reads the reading information into the call control unit. To the computer that constitutes the service center to be transmitted to the computer, and includes a program for executing the processing of the voice recognition unit, the screen creation unit, and the reading generation unit.

本発明によれば、聴覚障害者（発呼者）が健常者（着呼者）と、電話で円滑に会話することを可能としている。その理由は、本発明においては、音声認識装置で音声認識が誤った認識結果を出したとしても、聴覚障害者（着呼者）側で、その読みから、その誤り傾向を推察し、正しく発声内容を理解できるからである。 According to the present invention, a hearing-impaired person (caller) can smoothly talk with a healthy person (caller) by telephone. The reason for this is that in the present invention, even if the speech recognition apparatus gives a wrong recognition result, the hearing impaired person (calling party) infers the error tendency from the reading and correctly utters. This is because the contents can be understood.

本発明を実施するための最良の形態について以下に説明する。近年、Ｌ−モードや、ＩＰ網を使ったＩＰ電話等、文字情報（テキスト情報）だけでなく、ｗｗｗ（ｗｏｒｌｄｗｉｄｅｗｅｂ）の閲覧を可能としている画面表示機能付き電話端末が利用されている。このような端末を利用すれば、メールなどで、聴覚障害者が、いわゆる電話回線を使うことも可能であるが、発呼者の発声内容が画面に表示されれば、言語発声能力がある聴覚障害者であれば、本来の意味で、電話を使うことができるものと思料される。しかし、前述したように、電話音声認識による会話音声認識は難しく、かなりの誤認識、誤変換が生じる得ることが予想される。 The best mode for carrying out the present invention will be described below. In recent years, telephone terminals with a screen display function that allow browsing of not only character information (text information) but also www (world wide web), such as an L-mode and an IP phone using an IP network, have been used. If such a terminal is used, a hearing-impaired person can use a so-called telephone line by e-mail or the like. However, if the content of the caller's utterance is displayed on the screen, the hearing person who has the ability to speak A person with a disability is thought to be able to use the phone in the original sense. However, as described above, it is difficult to recognize conversational speech by telephone speech recognition, and it is expected that considerable misrecognition and erroneous conversion may occur.

そこで、本発明は、発呼側の端末（５）からの発話内容を、音声認識して、文字情報とその読み情報に変換する音声認識部（２）と、画面表示付き端末用の画面を作成する画面作成部（３）と、通話を管理し、音声認識部（２）と画面作成部（３）を制御する呼制御部（１）を備え、画面作成部（３）で作成された画面データには、発話者の発話内容（音声認識結果）だけでなく、読み情報が含まれ、これが、着呼側の端末（６）に送信されて表示される。読み情報は、平仮名、かたかな、ローマ字、発音記号など、発話内容の音声認識結果の文字情報の読みを表すものであれば、任意である。 Therefore, the present invention provides a speech recognition unit (2) that recognizes speech content from a calling terminal (5) and converts it into character information and reading information, and a screen for a terminal with a screen display. A screen creation unit (3) to be created and a call control unit (1) for managing a call and controlling the voice recognition unit (2) and the screen creation unit (3) are created by the screen creation unit (3). The screen data includes not only the utterance content (voice recognition result) of the speaker but also reading information, which is transmitted to the called terminal (6) and displayed. The reading information is arbitrary as long as it represents the reading of the character information of the speech recognition result of the utterance content, such as hiragana, kana, romaji, and phonetic symbols.

本発明を聴覚障害者向け電話補助システムに適用した場合、電話端末と公衆網と音声認識部と、文字、画像が表示できる電話端末とを備えた聴覚障害者向け電話補助システムにおいて、発話音声を音声認識し、音声認識結果と音声の読み情報を付加した文字情報に変換して、聴覚障害者の表示機能付き電話端末に表示し、聴覚障害者の電話による会話を可能にしている。 When the present invention is applied to a telephone assistance system for a hearing impaired person, in a telephone assistance system for a hearing impaired person comprising a telephone terminal, a public network, a voice recognition unit, and a telephone terminal capable of displaying characters and images, speech speech is transmitted. Voice recognition is performed and converted into character information to which a voice recognition result and voice reading information are added and displayed on a telephone terminal with a display function for a hearing impaired person, thereby enabling conversation by the telephone of the hearing impaired person.

本発明の動作の概要を説明すると、発呼側の端末（５）からの呼が、呼制御部（１）を経由し、サービスセンタ（４）の音声認識部（２）に接続されると、音声認識部（２）は、加入者の通話内容を音声認識し、文字情報に変換し、その際、誤認識の可能性があるため、認識結果とともに、その読み情報を付加して、画面作成部（３）に渡す。画面作成部（３）は、音声認識結果とその読み情報をあわせた画面を加工、作成し、着呼側の加入者の端末に送信する。着呼側のサービス加入者の端末（６）では、受信した画面（通話内容の音声認識結果と読み情報）が表示される。このため、もし、音声認識における認識、変換が誤っていたとしても、読み情報と対応付けて、本来の通話内容の解読を容易化し解読に有効な手助けとなる。なお、音声認識部（２）とは別に読み情報を生成する読みつけ生成部を備えた構成としてもよい。あるいは、着呼側の加入者の端末（６）で、音声認識結果から読みを推定して読み情報を生成するようにしてもよい。以下実施例について説明する。 The outline of the operation of the present invention will be explained. When a call from the terminal (5) on the calling side is connected to the voice recognition unit (2) of the service center (4) via the call control unit (1). The voice recognition unit (2) recognizes the content of the subscriber's call and converts it into character information. At this time, since there is a possibility of erroneous recognition, the reading information is added to the screen together with the recognition result. Delivered to the creation unit (3). The screen creation unit (3) processes and creates a screen combining the voice recognition result and the reading information, and transmits it to the called subscriber terminal. On the terminal (6) of the service subscriber on the called side, the received screen (voice recognition result and reading information of the call content) is displayed. For this reason, even if the recognition and conversion in voice recognition are incorrect, it is associated with the reading information, so that it is possible to facilitate the decoding of the original call contents and to help the decoding effectively. In addition, it is good also as a structure provided with the reading production | generation part which produces | generates reading information separately from a speech recognition part (2). Alternatively, reading information may be generated by estimating the reading from the voice recognition result at the terminal (6) of the called subscriber. Examples will be described below.

図１は、本発明の一実施例の構成を示す図である。図１を参照すると、本実施例の電話補助システムは、呼制御部１と、音声認識部２及び画面作成部３とを含むサービスセンタ４を備えている。音声認識部２及び画面作成部３の処理は、コンピュータ上で実行されるプログラムによって実現してもよい。 FIG. 1 is a diagram showing the configuration of an embodiment of the present invention. Referring to FIG. 1, the telephone assistance system of the present embodiment includes a service center 4 including a call control unit 1, a voice recognition unit 2, and a screen creation unit 3. The processing of the voice recognition unit 2 and the screen creation unit 3 may be realized by a program executed on a computer.

図２は、呼制御部１の構成の一例を示す図である。図２を参照すると、呼制御部１は、制御部１０と、音声蓄積部１１と、メディア変換部１２とを備え、通話を管理し、受信した音声を音声認識部２に送信する。 FIG. 2 is a diagram illustrating an example of the configuration of the call control unit 1. Referring to FIG. 2, the call control unit 1 includes a control unit 10, a voice storage unit 11, and a media conversion unit 12, manages a call, and transmits received voice to the voice recognition unit 2.

制御部１０は、呼制御部１の全体制御を行い、着呼した呼の管理と、音声の送受信を行う。 The control unit 10 performs overall control of the call control unit 1, manages incoming calls, and transmits and receives voices.

音声蓄積部１１は、発呼側、着呼側の音声の蓄積機能を備え、発呼者側に対してのガイダンスメッセージ音声も格納されている。 The voice storage unit 11 has a function of storing voices on the calling side and the called side, and also stores guidance message voices for the caller side.

メディア変換部１２は、プロトコル変換の機能を備え、着呼側と発呼側双方の音声データ変換を行う。 The media conversion unit 12 has a protocol conversion function and performs voice data conversion on both the called side and the calling side.

制御部１０は、通話を管理し、受信した音声を、一旦、音声蓄積部１１に蓄積し、音声認識部２に送信する。 The control unit 10 manages the call, temporarily stores the received voice in the voice storage unit 11, and transmits the voice to the voice recognition unit 2.

再び図１を参照すると、音声認識部２は、受信した音声を音声認識し、その出力結果（音声認識結果と読み情報）を、呼制御部１の制御部１０（図２参照）を経由して、画面作成部３に送る。本実施例において、音声認識部２は、例えば不特定話者の音声を認識するための任意の公知の手法で音声認識を行い、単語辞書を用いて単語が決定された認識結果に対して例えば仮名漢字変換を行って、発話内容に対応する文（テキスト文）を生成し、さらに、認識結果の文字情報に対して、読み情報を付加する処理を実行する。 Referring to FIG. 1 again, the speech recognition unit 2 recognizes the received speech and outputs the output result (speech recognition result and reading information) via the control unit 10 (see FIG. 2) of the call control unit 1. To the screen creation unit 3. In the present embodiment, the voice recognition unit 2 performs voice recognition by any known method for recognizing the voice of an unspecified speaker, for example, for a recognition result in which a word is determined using a word dictionary. A kana-kanji conversion is performed to generate a sentence (text sentence) corresponding to the utterance content, and further, a process of adding reading information to the character information of the recognition result is executed.

画面作成部３では、受信した音声認識部２の出力結果（音声認識結果と読み情報）を、画面表示機能付き電話端末で表示できる形（主に、ＨＴＭＬ（HyperText Markup Language）言語などのページ記述言語）に加工する。 In the screen creation unit 3, the received output result (speech recognition result and reading information) of the speech recognition unit 2 can be displayed on a telephone terminal with a screen display function (mainly page description such as HTML (HyperText Markup Language) language). Language).

呼制御部１の制御部１０（図２参照）は、画面作成部３から出力されたデータ(画面表示データ)を受信すると、該画面表示データを着呼側回線に送信する。 When the control unit 10 (see FIG. 2) of the call control unit 1 receives the data (screen display data) output from the screen creation unit 3, it transmits the screen display data to the called line.

着呼側の電話端末６では、画面表示データを受信すると、画面に、発話側で発声された音声認識結果と、その読み情報が表示される。このため、音声認識結果に、多少の誤りがあっても、容易に理解することが出来る。 When the telephone terminal 6 on the called side receives the screen display data, the voice recognition result uttered on the speaking side and the reading information thereof are displayed on the screen. For this reason, even if there are some errors in the speech recognition result, it can be easily understood.

着呼者の音声は、呼制御部１で受信され、呼制御部１に音声蓄積部１１(図２参照)に一旦蓄積された後、発呼者側プロトコルに合わせた形で、発呼者側に送信される。 The caller's voice is received by the call control unit 1, temporarily stored in the voice storage unit 11 (see FIG. 2) in the call control unit 1, and then in a form that matches the caller side protocol. Sent to the side.

このようにして、健常者の発呼側と、聾者の着呼側で、会話をすることができる。 In this way, a conversation can be made between the calling side of the healthy person and the calling side of the deaf person.

次に、図３は、本実施例の動作を説明するためのフローチャートである。図１、図２、図３（Ａ）を参照して、本実施例の動作を説明する。ただし、呼は既に繋がっているものとする。 FIG. 3 is a flowchart for explaining the operation of this embodiment. The operation of this embodiment will be described with reference to FIG. 1, FIG. 2, and FIG. However, the call is already connected.

発呼側端末５からの音声信号を呼制御部１が受信する（ステップＳ１）。 The call control unit 1 receives a voice signal from the calling terminal 5 (step S1).

入力された音声信号を、呼制御部１の制御部１０で登録、管理し、呼制御部１の音声蓄積部１１に音声を蓄積する（ステップＳ２）。 The input voice signal is registered and managed by the control unit 10 of the call control unit 1, and the voice is stored in the voice storage unit 11 of the call control unit 1 (step S2).

呼制御部１の制御部１０は、蓄積した音声を、サービスセンタ４の音声認識部２に音声認識処理を依頼して送信する（ステップＳ３）。 The control unit 10 of the call control unit 1 requests the voice recognition unit 2 of the service center 4 for voice recognition processing and transmits the accumulated voice (step S3).

サービスセンタ４の音声認識部２では、呼制御部１の制御部１０から受信した音声信号を音響分析等して音声認識し、出力結果を、認識結果と読み情報からなる出力データとして、呼制御部１の制御部１０に出力する。 In the voice recognition unit 2 of the service center 4, the voice signal received from the control unit 10 of the call control unit 1 is voice-analyzed by acoustic analysis or the like, and the output result is output as output data including the recognition result and reading information. To the control unit 10 of the unit 1.

呼制御部１の制御部１０は、サービスセンタ４の音声認識部２から受信した出力結果をサービスセンタ４の画面作成部３に送信する（ステップＳ４）。 The control unit 10 of the call control unit 1 transmits the output result received from the voice recognition unit 2 of the service center 4 to the screen creation unit 3 of the service center 4 (step S4).

サービスセンタ４の画面作成部３では、受信した出力データを基に、音声認識結果（主に仮名漢字交じり文章）と、その読み情報からなる画面情報データを作成し、呼制御部１の制御部１０に出力する（ステップＳ５）。 The screen creation unit 3 of the service center 4 creates screen information data composed of the voice recognition result (mainly kana-kanji mixed text) and the reading information based on the received output data, and the control unit of the call control unit 1 10 (step S5).

呼制御部１の制御部１０は、作成された画面情報を受信すると、それを着呼側の回線に送信する（ステップＳ６）。 Upon receiving the created screen information, the control unit 10 of the call control unit 1 transmits it to the incoming call line (step S6).

画面表示データを着呼側の端末（画面表示機能付き電話端末）６で受信し、認識結果、読み情報を、着呼側の端末６の画面に表示する（ステップＳ７）。 The screen display data is received by the called terminal (phone terminal with screen display function) 6, and the recognition result and the reading information are displayed on the screen of the called terminal 6 (step S7).

こうして、着呼側の画面表示機能付き電話端末６で、画面情報を受け取って表示すると、音声認識結果(例えば仮名漢字文)と読み情報（例えば平仮名表記）とが表示される。音声認識結果が正しい場合は、全く問題はないが、仮名漢字文等の認識結果に誤りを含んでいる場合にも、読み情報から、本来の正しい発話内容を類察し、正しい発話内容の見当をつけることができる。 In this way, when the incoming call side telephone terminal 6 with screen display function receives and displays the screen information, the voice recognition result (for example, kana kanji text) and the reading information (for example, hiragana notation) are displayed. If the speech recognition result is correct, there is no problem at all, but even if the recognition result of kana / kanji sentences contains errors, the correct correct utterance content can be estimated from the reading information by observing the original correct utterance content. You can turn it on.

図３（Ｂ）を参照すると、次に、受信者の音声は回線を通り、呼制御部１に到達する（ステップＳ１１）。 Referring to FIG. 3B, the recipient's voice passes through the line and reaches the call control unit 1 (step S11).

受信された音声は、呼制御部１の制御部１０によって、呼制御部１の音声蓄積部１１に蓄積される（ステップＳ１２）。 The received voice is stored in the voice storage unit 11 of the call control unit 1 by the control unit 10 of the call control unit 1 (step S12).

呼制御部１のメディア変換部１２は、図３（Ａ）のステップＳ２で登録した情報に基づき、適切な発呼者に対し、発呼側の端末５にあわせた適切な手順で送信する（ステップＳ１３）。 Based on the information registered in step S2 in FIG. 3A, the media conversion unit 12 of the call control unit 1 transmits to an appropriate caller in an appropriate procedure according to the terminal 5 on the call side ( Step S13).

発呼側の端末５で着呼側発声音声が受信され再生される（ステップＳ１４）。 The calling side terminal 5 receives and reproduces the called side uttered voice (step S14).

図４は、本発明の一実施例の着呼側の端末（画面表示機能付き電話端末）６の画面の一例を示す図である。例えば発呼元から、
「ｉ−ｍｏｄｅ（登録商標）は使えないのですか？」
と、呼制御部１に入力があったとする（図３（Ａ）のステップＳ１）。 FIG. 4 is a diagram showing an example of the screen of the called terminal (telephone terminal with a screen display function) 6 of one embodiment of the present invention. For example, from the caller
“Can i-mode (registered trademark) be used?”
Then, it is assumed that there is an input to the call control unit 1 (step S1 in FIG. 3A).

呼制御部１では、その音声信号と呼情報を、呼制御部１の制御部１０で登録し、音声信号を、呼制御部１の音声蓄積部１１に蓄積する（ステップＳ２）。 In the call control unit 1, the voice signal and call information are registered in the control unit 10 of the call control unit 1, and the voice signal is stored in the voice storage unit 11 of the call control unit 1 (step S2).

呼制御部１の制御部１０は、蓄積した音声信号を、サービスセンタ４の音声認識部２に認識処理を依頼し送信する（ステップＳ３）。 The control unit 10 of the call control unit 1 requests the voice recognition unit 2 of the service center 4 for recognition processing and transmits the accumulated voice signal (step S3).

サービスセンタ４の音声認識部２では、音声蓄積部１１から音声信号を受け取ると、その音声信号を分析し、
認識結果：「愛も独活は使えないのですか」、及び、
読み情報：「あいもうどはつかえないのですか」
からなる出力データを、呼制御部１の制御部１０に出力する。特に制限されないが、この例の場合、認識結果は、仮名漢字変換した文であり、読み情報は平仮名表記である。 When the voice recognition unit 2 of the service center 4 receives the voice signal from the voice storage unit 11, it analyzes the voice signal,
Recognition result: “Is it not possible to use love alone?” And
Reading information: "Is it impossible to use Aiyodo"
Is output to the control unit 10 of the call control unit 1. Although not particularly limited, in this example, the recognition result is a sentence converted to Kana-Kanji, and the reading information is in Hiragana notation.

呼制御部１の制御部１０は、出力データを、サービスセンタ４の画面作成部３に画面情報作成処理を依頼し、送信する（ステップＳ４）。 The control unit 10 of the call control unit 1 requests the screen creation unit 3 of the service center 4 for screen information creation processing and transmits the output data (step S4).

サービスセンタ４の画面作成部３では、受信したデータを基に、音声認識結果が入った文字情報と読み情報からなる画面情報を作成し、呼制御部１の制御部１０に送信する（ステップＳ５）。 Based on the received data, the screen creation unit 3 of the service center 4 creates screen information composed of character information and reading information including the voice recognition result, and transmits the screen information to the control unit 10 of the call control unit 1 (step S5). ).

呼制御部１の制御部１０では、作成された画面情報を受信すると、それを着呼側の回線に送信する（ステップＳ６）。 Upon receipt of the created screen information, the control unit 10 of the call control unit 1 transmits it to the called-side line (step S6).

画面情報を着呼側の画面表示機能付き電話端末６で受信し、発話内容認識結果と発話内容読み情報が表示される（ステップＳ７）。 The screen information is received by the telephone terminal 6 with the screen display function on the called side, and the utterance content recognition result and the utterance content reading information are displayed (step S7).

図４において、発話内容認識結果は、「愛も独活」と表示され、もし「独活（うど）」の読み方を知らなければ、着呼側では、「あいもどっかつ」とは何の意味かと判断に悩むことになる。 In FIG. 4, the utterance content recognition result is displayed as “love and self-existence”, and if the caller does not know how to read “self-existence (udo)”, the callee determines what “aimadokatsu” means You will be troubled.

しかし、下段の読み情報をみると、読み情報で、「あうもうど」と、表示されているので、音声認識で誤りを含んでいたとしても（例えば音声認識における単語の決定処理や仮名漢字変換処理に誤りがある場合にも）、読み情報から、正しい発話内容を類察することで、正しい発話内容である「ｉ−ｍｏｄｅ」の見当をつけることができる。 However, if you look at the reading information at the bottom, the reading information displays “Audou”, so even if it contains errors in speech recognition (for example, word determination processing or kana-kanji conversion in speech recognition) Even when there is an error in the processing, the correct utterance content “i-mode” can be determined by observing the correct utterance content from the reading information.

次に、着呼側（例えば聾者）で発声する（図４（Ｂ）のステップＳ１１）。 Next, the incoming call side (for example, a deaf person) speaks (step S11 in FIG. 4B).

呼制御部１の制御部１０は、受信した音声を音声蓄積部１１に蓄積する（ステップＳ１２）。 The control unit 10 of the call control unit 1 stores the received voice in the voice storage unit 11 (step S12).

呼制御部１のメディア変換部１２は、登録情報に基づき蓄積された音声を発呼者に送信する（ステップＳ１３）。このとき、たとえば、着呼側が、HTTP(Hyper Text Transport Protocol)、発呼側がVoIPであれば、RTP（Real-time Transport Protocol）に変換し、発呼側がPSTN(Public Switched Telephone Networks)であれば、デジタル・ハードウエア回線に出力する。 The media conversion unit 12 of the call control unit 1 transmits the accumulated voice based on the registration information to the caller (step S13). At this time, for example, if the called party is HTTP (Hyper Text Transport Protocol) and the calling party is VoIP, it is converted to RTP (Real-time Transport Protocol), and if the calling party is PSTN (Public Switched Telephone Networks) To the digital hardware line.

発呼者側の端末５で着呼者の音声が再生される（ステップＳ１４）。このようにして、発呼者（健常者）と聾者（着呼側）で会話をすることが出来る。 The caller's voice is reproduced at the caller side terminal 5 (step S14). In this way, it is possible to have a conversation between the calling party (healthy person) and the deaf person (calling side).

次に、本発明の第２の実施例について説明する。図５は、本発明の第２の実施例の構成を示す図である。本実施例では、読み情報を、サービスセンタ４’側の画面作成部３１で付加する。 Next, a second embodiment of the present invention will be described. FIG. 5 is a diagram showing the configuration of the second exemplary embodiment of the present invention. In this embodiment, the reading information is added by the screen creation unit 31 on the service center 4 'side.

この場合、音声認識部２で読みを出力する必要がなくなるので、音声認識部２は、既存のものをそのまま使用することができる。 In this case, it is not necessary for the speech recognition unit 2 to output a reading, so that the speech recognition unit 2 can use the existing one as it is.

図５を参照すると、本実施例は、呼制御部１と、音声認識部２と画面作成部３１と読みつけ生成部３２とを有するサービスセンタ４’を備えている。 Referring to FIG. 5, this embodiment includes a service center 4 ′ having a call control unit 1, a voice recognition unit 2, a screen creation unit 31, and a reading generation unit 32.

呼制御部１は、図２に示した構成と同様に、制御部１０と、音声蓄積部１１と、メディア変換部１２を備えている。ただし、呼制御部１の制御部１０は、音声認識部２と画面作成部３１と読みつけ生成部３２とに接続する。 The call control unit 1 includes a control unit 10, a voice storage unit 11, and a media conversion unit 12, similarly to the configuration illustrated in FIG. 2. However, the control unit 10 of the call control unit 1 is connected to the voice recognition unit 2, the screen creation unit 31, and the reading generation unit 32.

呼が接続されたあと、音声認識部２は、受信した音声を音声認識し、その出力結果（音声認識結果のみ）を、呼制御部１の制御部１０に送信する。 After the call is connected, the voice recognition unit 2 recognizes the received voice and transmits the output result (only the voice recognition result) to the control unit 10 of the call control unit 1.

呼制御部１の制御部１０は、その出力結果を音声蓄積部１１に保持し、サービスセンタ４’の読みつけ生成部３２に送る。 The control unit 10 of the call control unit 1 holds the output result in the voice storage unit 11 and sends it to the reading generation unit 32 of the service center 4 ′.

サービスセンタ４’の読みつけ生成部３２では、音声認識結果から読みを推定し、読み情報を、呼制御部１の制御部１０に送信する。 The reading generation unit 32 of the service center 4 ′ estimates the reading from the voice recognition result and transmits the reading information to the control unit 10 of the call control unit 1.

呼制御部１の制御部１０は、サービスセンタ４’の音声認識部２からの出力結果と、読みつけ生成部３２からの読み表記を、サービスセンタ４’の画面作成部３１に送る。 The control unit 10 of the call control unit 1 sends the output result from the voice recognition unit 2 of the service center 4 ′ and the reading notation from the reading generation unit 32 to the screen creation unit 31 of the service center 4 ′.

サービスセンタ４’の画面作成部３１は、音声認識結果と読み情報を、画面表示機能付き電話端末６で表示できる形（主にＨＴＭＬ言語などのページ記述言語）に加工し、制御部１０に送信する。 The screen creation unit 31 of the service center 4 ′ processes the voice recognition result and the reading information into a form (mainly a page description language such as HTML language) that can be displayed on the telephone terminal 6 with a screen display function, and transmits it to the control unit 10. To do.

呼制御部１の制御部１０は、サービスセンタ４’の画面作成部３１から出力されたデータを受信すると、それを着呼側回線に送信する。 When the control unit 10 of the call control unit 1 receives the data output from the screen creation unit 31 of the service center 4 ′, it transmits it to the called line.

なお、着呼者の音声は、前記実施例と同様にそのまま音声として発呼者に送信される。 Note that the caller's voice is transmitted to the caller as it is as in the above embodiment.

図６は、本発明の第２の実施例の動作を説明するフローチャートである。図５、図２、及び図６を参照して、本発明の第２の実施例の動作を説明する。 FIG. 6 is a flowchart for explaining the operation of the second embodiment of the present invention. The operation of the second embodiment of the present invention will be described with reference to FIG. 5, FIG. 2, and FIG.

発呼者からの音声信号を呼制御部１が受信する（ステップＳ２１）。 The call control unit 1 receives a voice signal from the caller (step S21).

入力された音声信号を呼制御部１の制御部１０で登録、管理し、呼制御部１の音声蓄積部１１に音声を蓄積する（ステップＳ２２）。 The input voice signal is registered and managed by the control unit 10 of the call control unit 1, and the voice is stored in the voice storage unit 11 of the call control unit 1 (step S22).

呼制御部１の制御部１０は、蓄積した音声を、サービスセンタ４’の音声認識部２に送信する（ステップＳ２３）。 The control unit 10 of the call control unit 1 transmits the accumulated voice to the voice recognition unit 2 of the service center 4 '(step S23).

サービスセンタ４’の音声認識部２では、呼制御部１の制御部１０から受信した音声信号を分析認識し、出力結果を認識結果を出力データとして、呼制御部１の制御部１０に出力する。 The voice recognition unit 2 of the service center 4 ′ analyzes and recognizes the voice signal received from the control unit 10 of the call control unit 1, and outputs the output result to the control unit 10 of the call control unit 1 using the recognition result as output data. .

呼制御部１の制御部１０は受信した出力結果を、サービスセンタ４’の読みつけ生成部３２に送信する（ステップＳ２４）。 The control unit 10 of the call control unit 1 transmits the received output result to the reading generation unit 32 of the service center 4 '(step S24).

サービスセンタ４’の読みつけ生成部３２では、受信したデータから読み情報を推定し、呼制御部１の制御部１０に結果を送信する（ステップＳ２５）。 The reading generation unit 32 of the service center 4 'estimates reading information from the received data, and transmits the result to the control unit 10 of the call control unit 1 (step S25).

呼制御部１の制御部１０は、サービスセンタ４’の音声認識部２の出力結果と、読みつけ生成部３２の出力結果を、サービスセンタ４’の画面作成部３１に送信する（ステップＳ２６）。 The control unit 10 of the call control unit 1 transmits the output result of the voice recognition unit 2 of the service center 4 ′ and the output result of the reading generation unit 32 to the screen creation unit 31 of the service center 4 ′ (step S26). .

サービスセンタ４’の画面作成部３１では、受信したデータを基に、音声認識結果が入った文字情報と読み情報からなる画面情報データを作成し、呼制御部１の制御部１０に送信する（ステップＳ２７）。 The screen creation unit 31 of the service center 4 ′ creates screen information data composed of character information and reading information containing the voice recognition result based on the received data, and transmits it to the control unit 10 of the call control unit 1 ( Step S27).

呼制御部１の制御部１０では、作成された画面情報を受信すると、それを着呼側の回線に送信する（ステップＳ２８）。 Upon receiving the created screen information, the control unit 10 of the call control unit 1 transmits it to the incoming call line (step S28).

画面表示データを端末６（着呼側の画面表示機能付き電話端末）で受信し、発話内容認識結果、発話内容読み情報を着呼側の画面表示機能付き電話端末６で表示する（ステップＳ２９）。 The screen display data is received by the terminal 6 (the telephone terminal with the screen display function on the called side), and the utterance content recognition result and the utterance content reading information are displayed on the telephone terminal 6 with the screen display function on the called side (step S29). .

着呼側の画面表示機能付き電話端末６で画面情報を受け取り、表示すると、音声認識結果と、その読み情報とが同一画面に表示されるため、音声認識結果が正しかった場合はもちろん、誤りを含んでいたとしても、読み情報から正解発音を類察し正しい発話内容の見当をつけることができる。 When the incoming call side telephone terminal 6 with screen display function receives and displays the screen information, the voice recognition result and the reading information are displayed on the same screen. Even if it is included, the correct pronunciation can be observed from the reading information and the correct utterance content can be determined.

次に、着呼者の電話音声は回線を通り、呼制御部１に到達する（図３（Ｂ）のステップＳ１１）。 Next, the telephone voice of the called party passes through the line and reaches the call control unit 1 (step S11 in FIG. 3B).

受信された音声は、呼制御部１の制御部１０によって音声蓄積部１１に蓄積される（図３（Ｂ）のステップＳ１２）。 The received voice is stored in the voice storage unit 11 by the control unit 10 of the call control unit 1 (step S12 in FIG. 3B).

呼制御部１のメディア変換部１２はステップＳ１で登録した情報に基づき、適切な発呼者に対し、発呼者にあわせた適切な手順で送信する（図３（Ｂ）のステップＳ１３）。 Based on the information registered in step S1, the media conversion unit 12 of the call control unit 1 transmits to an appropriate caller in an appropriate procedure according to the caller (step S13 in FIG. 3B).

発呼側の端末５で、着呼側の発声した音声が受信され再生される（図３（Ｂ）のステップＳ１４）。 The calling side terminal 5 receives and reproduces the voice uttered by the called side (step S14 in FIG. 3B).

なお、本発明の第３の実施例として、読み情報を、着呼側の端末６側で生成するようにしてもよい。この場合、図１、図５のサービスセンタ４、４’の音声認識部２あるいは読みつけ生成部３２において読み情報を生成する必要がなくなり、また画面作成部３において読み情報を付加する必要がなくなるため、サービスセンタ側の処理負荷、負担が軽減される。本発明の第３の実施例の処理手順については、サービスセンタ側では、読み情報を扱わず、認識結果の表示された画面情報を受信した端末６側で、読みつけ生成部が起動し、読み情報を生成する。他の処理は、前記実施例の手順に従う。 As a third embodiment of the present invention, the reading information may be generated on the terminal 6 side of the called side. In this case, it is not necessary to generate reading information in the voice recognition unit 2 or the reading generation unit 32 of the service centers 4 and 4 ′ in FIGS. 1 and 5, and it is not necessary to add reading information in the screen creation unit 3. This reduces the processing load and burden on the service center side. Regarding the processing procedure of the third embodiment of the present invention, the reading center is not handled on the service center side, and the reading generation unit is activated on the terminal 6 side that has received the screen information on which the recognition result is displayed. Generate information. Other processing follows the procedure of the above embodiment.

本発明は、福祉、社会サービス等の電話サービスに提供して好適とされる。 The present invention is suitable for providing telephone services such as welfare and social services.

以上、本発明を上記実施例に即して説明したが、本発明は上記実施例の構成にのみ制限されるものでなく、本発明の範囲内で当業者であればなし得るであろう各種変形、修正を含むことは勿論である。 Although the present invention has been described with reference to the above-described embodiments, the present invention is not limited to the configurations of the above-described embodiments, and various modifications that can be made by those skilled in the art within the scope of the present invention. Of course, including modifications.

本発明の一実施例の構成を示す図である。It is a figure which shows the structure of one Example of this invention. 本発明の一実施例の呼制御部の構成を示す図である。It is a figure which shows the structure of the call control part of one Example of this invention. 本発明の一実施例の動作を説明するための流れ図である。It is a flowchart for demonstrating operation | movement of one Example of this invention. 本発明の一実施例の着呼側の画面表示機能付き電話端末の画面表示例を示す図である。It is a figure which shows the example of a screen display of the telephone terminal with a screen display function of the callee side of one Example of this invention. 本発明の別の実施例の構成を示す図である。It is a figure which shows the structure of another Example of this invention. 本発明の別の実施例の動作を説明するための流れ図である。It is a flowchart for demonstrating operation | movement of another Example of this invention.

Explanation of symbols

１呼制御部
２音声認識部
３、３１画面作成部
３２読みつけ生成部
４、４’ サービスセンタ
５発呼側の端末
６着呼側の端末
１０制御部
１１音声蓄積部
１２メディア変換部 DESCRIPTION OF SYMBOLS 1 Call control part 2 Voice recognition part 3, 31 Screen creation part 32 Reading production | generation part 4, 4 'Service center 5 Calling side terminal 6 Calling side terminal 10 Control part 11 Voice storage part 12 Media conversion part

Claims

A call control unit that is connected to the line and controls call connection;
A service center connected to the call control unit, comprising a voice recognition unit and a screen creation unit;
With
Call from caller, via the call control unit, when connected to the speech recognition unit of the service center, the voice recognition unit, the utterance content from the calling party recognizes speech, The recognition result in which the word is determined by the voice recognition is converted into character information corresponding to the utterance content, and the voice recognition unit adds reading information to the character information, and the call control unit To the screen creation unit via
The screen creation unit creates screen data that combines the character information corresponding to the utterance content and its reading information and passes it to the call control unit,
The call control unit transmits to the called terminal,
Telephone system the called terminals, wherein you screen and reading the character information data corresponding to the utterance contents, characterized in that.

The service center includes a reading generation unit separately from the voice recognition unit,
After the call from the caller is connected, in the voice recognition unit of the service center, the utterance content from the caller's voice recognition, the recognition result of the word has been determined by the voice recognition to the The voice recognition unit does not generate reading information for the character information corresponding to the utterance content, and the character information corresponding to the utterance content is sent to the call control unit. Hand over,
The call control unit sends character information corresponding to the utterance content to the reading generation unit of the service center,
The reading generation unit generates reading information by estimating reading from character information corresponding to the utterance content, and transmits the reading information to the call control unit,
The call control unit generates character information corresponding to the utterance content output from the voice recognition unit of the service center and reading information from the reading generation unit of the service center, and creates the screen of the service center. To the department,
The screen creating section, the telephone system of claim 1, wherein the creating the character information corresponding to the speech content and a screen data combined thus read information, it is characterized.

Separately from the voice recognition unit, a reading generation unit is provided in the terminal on the called side,
In the voice recognition portion of the service center, it does not generate read information, the screen creation unit creates the screen data of the character information corresponding to the speech content from the speech recognition unit,
Telephone system according to claim 1, wherein the generating unit attached reading of the called terminal generates the reading by estimating information read from the character information corresponding to the uttered contents, it is characterized.

Speech from the called terminal is sent to the calling terminal, the telephone system according to claim 1, wherein the received at the calling side terminal is reproduced, characterized in that.

A service center including a voice recognition unit connected to the call control unit and a screen creation unit,
Words a call from the caller is connected to the speech recognition unit via the call control unit, the voice recognition unit, the utterance content from the calling party voice recognition by the voice recognition Is converted into character information corresponding to the utterance content , and the voice recognition unit adds reading information to the character information, and the screen is transmitted via the call control unit. To the creation department,
The screen creation unit creates screen data that combines the character information corresponding to the utterance content and its reading information, and transmits the screen data to the called terminal via the call control unit. A featured service center.

6. The service center according to claim 5 , further comprising a reading generation unit separately from the voice recognition unit,
After the call from the caller is connected, the voice recognition unit, the utterance content of the caller's voice recognition, to the speech content with respect to the recognition result word has been determined by the voice recognition The voice recognition unit does not generate reading information for the character information corresponding to the utterance content, and reads the character information corresponding to the utterance content via the call control unit. To the generator,
The reading generation unit generates reading information by estimating reading from character information corresponding to the utterance content, and transmits the reading information to the call control unit,
The call control unit sends the character information corresponding to the utterance content output from the voice recognition unit of the service center and the reading information from the reading generation unit to the screen creation unit of the service center. A service center characterized by this.

The character information is a kana-kanji converted recognition result in which a word is determined by the voice recognition,
The telephone according to any one of claims 1 to 3 , wherein the reading information represents character information corresponding to the utterance content in at least one of hiragana, romaji, and phonetic symbols. system.

A call control unit that is connected to the line and controls call connection;
A service center connected to the call control unit and comprising a voice recognition unit and a screen creation unit;
A call assistance method for a telephone system comprising:
Call from caller, via the call control unit, when connected to the speech recognition unit of the service center, the voice recognition unit, the utterance content from the calling party recognizes speech, The recognition result in which the word is determined by the voice recognition is converted into character information corresponding to the utterance content , and the voice recognition unit adds reading information to the character information, and the call control unit To the screen creation unit via
The screen creation unit creates screen data combining the character information corresponding to the utterance content and its reading information and passes it to the call control unit, and the call control unit transmits to the terminal on the called side,
The called terminal, the you screen displaying information read the character information corresponding to the uttered contents, call the auxiliary method for a telephone system, characterized in that.

The service center includes a reading generation unit separately from the voice recognition unit,
After the call from the caller is connected, the voice recognition unit of the service center, the utterance content from the caller's voice recognition, the recognition result of the word has been determined by the voice recognition to the The voice recognition unit does not generate reading information for the character information corresponding to the utterance content, and the character information corresponding to the utterance content is sent to the call control unit. Hand over,
The call control unit sends character information corresponding to the utterance content to the reading generation unit of the service center, and the reading generation unit estimates a reading from the character information corresponding to the utterance content, and reads the reading information. and transmitted to the call control unit, the call control unit, and the character information corresponding to the speech content output from the speech recognition unit of the service center, the read information from the read put generating unit, the service 9. The telephone system call assistance according to claim 8 , wherein the screen creation unit creates screen data that combines character information corresponding to the utterance content and its reading information. Method.

9. The telephone system call assisting method according to claim 8, wherein the voice from the called terminal is transmitted to the calling terminal, and is received and played back by the calling terminal.

The character information is a kana-kanji converted recognition result in which a word is determined by the voice recognition,
10. The telephone system call assistance according to claim 8 or 9 , wherein the reading information represents character information corresponding to the utterance content in at least one of hiragana, romaji, and phonetic symbols. Method.

A voice recognition unit connected to the call control unit, and a screen creation unit;
When a call from a caller is connected to the voice recognition unit via the call control unit,
The voice recognition unit, said utterance content from the caller voice recognition, converted into character information corresponding to the uttered contents against recognition result word is determined by the speech recognition, yet the voice recognition parts adds information read to the text information, executes the pass to process the viewing preparation unit through the call control unit,
The screen creation unit creates screen data combining character information corresponding to the utterance content and reading information, passes the screen data to the call control unit, and transmits the call data to the called terminal via the call control unit. the computer constituting the service center to execute the processing program for executing the respective processing of the viewing preparation unit and the voice recognition unit.

In claim 1 wherein the program,
The service center includes a reading generation unit separately from the voice recognition unit,
After the call from the caller is connected, the voice recognition unit, the utterance content from the calling party speech recognition, the speech content for the recognition result word has been determined by the voice recognition The speech recognition unit does not generate reading information for the character information corresponding to the utterance content, and the character information corresponding to the utterance content is read via the call control unit. run the feeding that process to put generation unit,
The read put generating unit estimates read from the character information corresponding to the speech content, the read information, the computer constituting the service center to execute the process of transmitting to the call control unit, and the speech recognition unit program for executing the processes of generating portion attached to read the said screen creating unit.

The character information is a kana-kanji converted recognition result in which a word is determined by the voice recognition,
The read information is, Hiragana those estimated reading character information corresponding to the uttered contents, Romaji, is obtained by representation of at least one of phonetic symbols, according to claim 1 2 or 13, characterized in that The program described in.