JP2009145434A

JP2009145434A - Speech recognition system

Info

Publication number: JP2009145434A
Application number: JP2007320332A
Authority: JP
Inventors: Zuisho O; 瑞璋王
Original assignee: CHUHEI O; O CHUHEI; ZUISHO O
Current assignee: CHUHEI O; O CHUHEI; ZUISHO O
Priority date: 2007-12-12
Filing date: 2007-12-12
Publication date: 2009-07-02

Abstract

<P>PROBLEM TO BE SOLVED: To provide a speech recognition system which represents a recording state, a state in progression of speech recognition and the completion state of speech recognition by utilizing waveform. <P>SOLUTION: The speech recognition system comprises at least one speech recognition engine and a display device, wherein a signal indicator interface and a letter output interface are disposed on the display device. Between both interfaces, the signal indicator interface represents a speech signal input by a user by waveform and displays the recording state, the state in progression of speech recognition or the completion state of speech recognition. The letter output interface is used for displaying a character result which is speech-recognized and the character result includes at least one word unit. Further, the displayed word unit and a corresponding waveform unit are connected to one set of feedback adjustment selection items respectively; the set of feedback adjustment selection items includes at least one feedback adjustment selection item, makes the user select a command to correct a speech recognition error or perform feedback adjustment of the speech recognition system. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、音声認識システムに関し、特に、デスクトップ型パソコン、ノート型パソコン、家庭用マルチメディアシステム、テレビ、ＤＶＤ、映像システム、携帯電話またはＰＤＡ等の画像表示インターフェイスの電子装置に適用し、波形で録音状態、音声処理状態または音声認識完成状態を表示することができ、且つ波形と文字表示インターフェイス上に少なくとも１つのフィードバック調整選択項目を有し、使用者がコマンドを選択し、音声認識エラーを更正するか、または、音声認識システムを調整する音声認識技術の視覚フィードバックインターフェースに関する。 The present invention relates to a speech recognition system, and in particular, applied to an electronic device of an image display interface such as a desktop personal computer, a notebook personal computer, a home multimedia system, a television, a DVD, a video system, a mobile phone or a PDA, and has a waveform. The recording status, speech processing status or speech recognition completion status can be displayed, and at least one feedback adjustment selection item is provided on the waveform and character display interface, and the user selects a command and corrects the speech recognition error. Or relates to a visual feedback interface of speech recognition technology for adjusting a speech recognition system.

昨今、多くの電子装置の使用方面において、音声認識技術の発展が使用者を更に便利にしている。デスクトップ型パソコン、ノート型パソコン、携帯電話、ＰＤＡまたはその他の電子装置は、いずれもその入力は、制御の目的を達成するために視覚と肢体の相互の協調が必要である。例を挙げれば、使用者がパソコンを操作する時、コマンドの入力を完成するためにキーボード、マウスまたはその他の付属の制御装置の補助が必要である。または、タッチ式モニターを使用し、プログラムの入力を簡易化するが、しかしながら、モニター表示面積は有限であり、且つ指を使用して操作を行う必要があり、依然として最大の便利性を達成できていない。上記問題は一般人にとっては、ただ不便であるだけかもしれないが、身体障害者、運動神経に疾患を有するもの、または視覚障害者である使用者にとっては、上記電子装置を操作することは、困難である。音声認識技術は、正にこれらの問題を解決することができる。 In recent years, the development of voice recognition technology has made the user more convenient in the usage of many electronic devices. Any desktop PC, notebook PC, mobile phone, PDA or other electronic device requires input and / or limb coordination to achieve control objectives. For example, when a user operates a personal computer, it needs the assistance of a keyboard, mouse or other attached control device to complete command input. Or, use a touch-type monitor to simplify program input. However, the monitor display area is limited, and it is necessary to perform operations using fingers, and the maximum convenience is still achieved. Absent. The above problem may only be inconvenient for the general public, but it is difficult to operate the electronic device for a physically handicapped person, a person with motor nerve disease, or a visually impaired user. It is. Speech recognition technology can exactly solve these problems.

音声認識の応用方面は、使用者が音声信号入力装置、例えば、マイクロフォンだけを必要とし、その音声入力を音声認識システムに認識させ、その後、入力したい音声に対応した文字を出力するか、または更に取得した音声認識結果に基づき直接コマンド操作を行う。 The application direction of voice recognition is that the user needs only a voice signal input device, for example, a microphone, and the voice recognition system recognizes the voice input, and then outputs a character corresponding to the voice to be input, or further Direct command operation is performed based on the acquired voice recognition result.

音声認識システムを使用する時、上記のように、使用者は音声信号入力装置によって音声を入力ならびに録音し、その後、認識過程を開始する。録音および認識過程中、多くの要因が最終的な音声認識結果に影響を及ぼし、例えば、使用する音声信号入力装置の種類、録音の環境、音声入力装置との距離等である。従って、録音および認識過程の監視制御に対して、その需要を有している。これに関して、従来の技術または異なる画像を使用し録音状態および認識状態を表示する、または画像の変化を利用し録音または認識状態を表示するものがある。しかしながら、状態を表示すると同時に、録音または認識結果の品質の優劣または録音過程が成功しているか否かを反映させることができない。 When using the voice recognition system, as described above, the user inputs and records voice by the voice signal input device, and then starts the recognition process. During the recording and recognition process, many factors affect the final speech recognition result, such as the type of speech signal input device used, the recording environment, the distance from the speech input device, and the like. Therefore, there is a need for monitoring and control of recording and recognition processes. In this regard, there is a conventional technique or using a different image to display the recording state and the recognition state, or using a change in the image to display the recording or recognition state. However, simultaneously with displaying the status, it is impossible to reflect the superiority or inferior quality of the recording or recognition result or whether the recording process is successful.

また、音声認識の結果に対して、従来技術または認識結果に基づき若干の調整機能を提供するものがあるが、大半は、音声認識結果全体に対して設けられるだけであるので、音声認識結果中の某部分に対して調整を行い、フィードバックにより音声認識システムの機能を向上させることはできず、使用者別の需要に適合することができない。例を挙げれば、使用者別の某文字または言葉の発音に特殊な訛りがあり、該文字、言葉に対する音声認識結果に対して、フィードバック、調整を行うことができない場合、使用者別に合わせた音声認識システムを提供することができず、その実際の運用の効果が大幅に減少する。
特開２００３−１６７６００号公報 Some of the speech recognition results provide some adjustment functions based on the prior art or the recognition results, but most of them are provided only for the entire speech recognition results. It is not possible to improve the function of the speech recognition system through feedback by adjusting the heel part, and it is not possible to meet the demand for each user. For example, if there is a special utterance in the pronunciation of words or words by user and if the voice recognition results for the characters and words cannot be fed back or adjusted, the voices tailored to each user The recognition system cannot be provided, and the effect of its actual operation is greatly reduced.
JP 2003-167600 A

本発明の目的は、波形を利用して録音状態、音声認識進行中の状態、または音声認識完成状態を表す音声認識システムを提供し、使用者に直接音声信号波で録音品質、音声処理速度および音声認識結果品質を監視制御させることである。 An object of the present invention is to provide a voice recognition system that uses a waveform to indicate a recording state, a voice recognition in progress state, or a voice recognition completion state. The quality of the voice recognition result is monitored and controlled.

本発明のもう１つの目的は、エラー更正フィードバック調整機制を備えた音声認識システムを提供し、使用者に効率的な音声認識エラーの更正、または音声認識システムのフィードバック調整を提供することである。 Another object of the present invention is to provide a speech recognition system with an error correction feedback adjustment mechanism to provide a user with efficient speech recognition error correction or feedback adjustment of the speech recognition system.

上記目的を達成するため、本発明が提供する音声認識システムは、少なくとも１つの音声認識エンジンおよび表示装置を含み、且つ該表示装置上に信号インジケータインターフェイスおよび文字出力インターフェイスを設け、そのうち、該信号インジケータインターフェイスは、波形によって使用者が入力した音声信号を表し、録音状態、音声認識進行中状態または音声認識完成状態を表示する。該文字出力インターフェイスは、音声認識した文字結果を表示することに用い、且つ該文字結果は少なくとも１つのワードユニットを含む。また、表示するワードユニットおよび対応する波形ユニットがそれぞれ１つのフィードバック調整選択項目組と連結する；そのフィードバック調整選択項目組は、少なくとも１つのフィードバック調整選択項目を含み、使用者にコマンドを選択させ、音声認識エラーを更正するか、または音声認識システムをフィードバック調整する。 In order to achieve the above object, a speech recognition system provided by the present invention includes at least one speech recognition engine and a display device, and includes a signal indicator interface and a character output interface on the display device, and the signal indicator includes The interface represents a voice signal input by the user with a waveform, and displays a recording state, a voice recognition in progress state, or a voice recognition completion state. The character output interface is used to display a voice-recognized character result, and the character result includes at least one word unit. In addition, the word unit to be displayed and the corresponding waveform unit are each connected with one feedback adjustment selection item set; the feedback adjustment selection item set includes at least one feedback adjustment selection item, and allows the user to select a command; Correct the speech recognition error or feedback adjust the speech recognition system.

上記波形は、好適には、異なる色でそれぞれ録音状態、音声認識信号中状態および音声認識完成状態を表す。 The waveform preferably represents the recording state, the state of speech recognition signal, and the state of speech recognition completion in different colors.

上記入力した音声信号が認識完成した後、認識結果を文字出力インターフェイス上に出力し、且つ各ワードユニットが好適には、信号インジケータインターフェイス上に対応する波形ユニットと照合する。音声認識品質は、大きく分けて三種程度であり、品質良好、品質不良および品質劣悪で厳格な検視、更正が必要である状態を含み、それぞれ異なる色で表示される。 After the input speech signal is recognized, the recognition result is output on the character output interface, and each word unit is preferably checked against the corresponding waveform unit on the signal indicator interface. The speech recognition quality is roughly divided into about three types, including a state in which strict inspection and correction are required due to good quality, poor quality, and poor quality, and are displayed in different colors.

上記信号インジケータインターフェイスの波形ユニットと１つのフィードバック調整選択項目組が連結される；そのフィードバック調整選択項目組が少なくとも１つの選択項目の連結を含み使用者にコマンドを選択させ、効率的に音声認識エラーを更正するか、または音声認識システムをフィードバック調整する。 The waveform unit of the signal indicator interface and one feedback adjustment selection item set are connected; the feedback adjustment selection item set includes a connection of at least one selection item, and allows the user to select a command, thereby efficiently performing a voice recognition error. Or correct the feedback of the speech recognition system.

上記文字出力インターフェイスのワードユニットと１つのフィードバック調整選択項目組が連結する。そのフィードバック調整選択項目組が少なくとも１つの選択項目との連結を含み、使用者にコマンドを選択させ、効率的に音声認識エラーを更正するか、または音声認識システムをフィードバック調整する。 The word unit of the character output interface is connected to one feedback adjustment selection item set. The feedback adjustment selection set includes a concatenation with at least one selection item to allow the user to select a command and efficiently correct a speech recognition error or feedback adjust the speech recognition system.

本発明が提供する波形を利用し使用者の音声信号を表す音声認識システムによって、使用者は録音過程が成功しているか否かおよび入力音声信号の品質を即座に判断することができる。
本発明が提供する波形の色を変化させる音声認識システムによって、使用者は、音声処理の進度および音声認識結果の品質を便利に監視制御できる。
本発明が提供する音声認識システムによって、使用者は、ワードユニットが入力した音声信号および音声認識結果の文字に対してエラー更正またはシステムのフィードバック調整することで便利に文字入力の作業が完成でき、あるいは該音声認識システムの性能の改善を持続することができる。 The voice recognition system that represents the user's voice signal using the waveform provided by the present invention allows the user to immediately determine whether the recording process is successful and the quality of the input voice signal.
The speech recognition system that changes the color of the waveform provided by the present invention allows the user to conveniently monitor and control the progress of speech processing and the quality of speech recognition results.
With the speech recognition system provided by the present invention, the user can conveniently complete the character input operation by correcting the error or adjusting the feedback of the system for the speech signal input by the word unit and the characters of the speech recognition result. Alternatively, the improvement of the performance of the voice recognition system can be continued.

本発明の音声認識システムの好適な実施例を図面を用いて説明する。
図１は、本発明の実施例の音声認識技術の視覚フィードバックインターフェースを用いた音声認識システムについての説明図であるが、デスクトップ型パソコン、ノート型パソコン、家庭用マルチメディアシステム、テレビ、ＤＶＤ、映像システム、携帯電話またはＰＤＡ等の画像表示インターフェイスの電子装置に適用する音声認識システムに関するものである。
図に示すように、本発明の実施例の音声認識システムは、少なくとも音声認識エンジン１０および表示装置２０を含み、且つ該表示装置２０に信号インジケータインターフェイス３０および文字出力インターフェイス４０を設けてなる。そのうち、該信号インジケータインターフェイス３０は波形３２によって使用者が入力する音声信号を表し、録音状態の音声認識状態を表示する。該文字出力インターフェイス４０は、音声認識結果の文字４２を表示し、該文字結果は、少なくとも１つのワードユニットを含む。図に示すように、信号インジケータインターフェイス３０上に表示する波形３２は、使用者が入力した音声信号を表示することに用い、文字出力インターフェイス４０上に表示する文字４２は上記音声認識後に取得される結果である。 A preferred embodiment of the voice recognition system of the present invention will be described with reference to the drawings.
FIG. 1 is an explanatory diagram of a voice recognition system using a visual feedback interface of a voice recognition technique according to an embodiment of the present invention. A desktop personal computer, a notebook personal computer, a home multimedia system, a television, a DVD, and an image. The present invention relates to a speech recognition system applied to an electronic device of an image display interface such as a system, a mobile phone or a PDA.
As shown in the figure, the speech recognition system according to the embodiment of the present invention includes at least a speech recognition engine 10 and a display device 20, and the display device 20 is provided with a signal indicator interface 30 and a character output interface 40. Among them, the signal indicator interface 30 represents a voice signal input by the user by a waveform 32 and displays a voice recognition state of a recording state. The character output interface 40 displays a speech recognition result character 42, the character result including at least one word unit. As shown in the figure, the waveform 32 displayed on the signal indicator interface 30 is used to display a voice signal input by the user, and the character 42 displayed on the character output interface 40 is acquired after the voice recognition. It is a result.

また、本発明の実施例の音声認識システムの表示装置２０は、デスクトップ型パソコン、ノート型パソコン、家庭用マルチメディアシステム、テレビ、ＤＶＤ、ＡＶ機器、携帯電話またはＰＤＡ等の電子装置の表示スクリーンに用いることができ、映像信号を出力する表示スクリーンまたはリモコン上の表示スクリーンに連結できる。 The display device 20 of the speech recognition system according to the embodiment of the present invention is used as a display screen of an electronic device such as a desktop personal computer, a notebook personal computer, a home multimedia system, a television, a DVD, an AV device, a mobile phone, or a PDA. It can be used and can be connected to a display screen for outputting a video signal or a display screen on a remote control.

図２は、本発明の第１実施例の説明図であり、使用者の録音状態を表示する。図に示すように、録音過程時、使用者は、音声信号入力装置（図示せず、例えば、マイクロフォン）によって音声を音声認識システム内に入力し、入力した音声信号は波形３２で信号インジケータインターフェイス３０上に表示される。波形の使用は、２つの利点を有する内の１つは、使用者が録音過程において、ある原因によって使用者の音声信号を実際上円滑に入力できない場合、例えば、音声信号入力装置が起動しない、音声入力装置と音声認識システムを設けた電子装置の接触不良等の原因、この時、使用者は、波形の変化を観察し、即座に反応でき、時間の浪費を避けることができる。
もう１つは、波形の形状に基づき、使用者は、音声信号の入力品質をだいたい判別することができ、適切な調整を行うことができる。例を挙げれば、環境ノイズの干渉、使用する音声入力装置の感度、または使用者が音声入力装置を使用する方式等が音声信号を入力する品質に影響する可能性があり、録音段階において、音声信号入力品質に影響を及ぼす要因を掌握し排除すれは、その後の音声認識過程において、軽視できない補助となる。 FIG. 2 is an explanatory diagram of the first embodiment of the present invention, and displays the recording state of the user. As shown in the figure, during the recording process, the user inputs a voice into the voice recognition system by a voice signal input device (not shown, for example, a microphone), and the input voice signal has a waveform 32 and a signal indicator interface 30. Displayed above. The use of the waveform has two advantages. One of the advantages is that if the user cannot input the user's voice signal smoothly for a certain reason during the recording process, for example, the voice signal input device does not start. Causes such as poor contact between the electronic device provided with the voice input device and the voice recognition system, and at this time, the user can observe the change in the waveform and react immediately, thereby avoiding wasting time.
The other is that the user can roughly determine the input quality of the audio signal based on the waveform shape, and can make an appropriate adjustment. For example, environmental noise interference, the sensitivity of the audio input device used, or the method in which the user uses the audio input device may affect the quality of the audio signal input. Grasping and eliminating the factors that affect signal input quality is an auxiliary that cannot be neglected in the subsequent speech recognition process.

上記のように、本発明の第１実施例の該信号インジケータインターフェイス３０は、波形３２によって録音状態および音声認識状態を表示し、そのうち、音声認識状態が更に音声認識進行中状態および音声認識完成状態を含む。また、録音状態、音声認識進行中状態および音声認識完成状態を表す代表は、それぞれ異なる色で表示し、使用者が視覚上で処理状態を、音声認識品質、または音声認識の速度を判断できる。 As described above, the signal indicator interface 30 of the first embodiment of the present invention displays the recording state and the voice recognition state by the waveform 32, and the voice recognition state further includes the voice recognition in progress state and the voice recognition completion state. including. In addition, representatives representing the recording state, the voice recognition in progress state, and the voice recognition completion state are displayed in different colors, and the user can visually determine the processing state, the voice recognition quality, or the speed of voice recognition.

使用者が入力した音声信号が識別進行中である時、上記信号インジケータインターフェイス３０の信号波形３２は、異なる色で既に処理された信号波形を表示し、音声認識の過程を示す。言い換えれば、開始時、使用者が入力した音声信号を録音状態の色で表示する。音声認識プロセス開始後、処理した音声信号は音声認識進行中状態の色で表示する。入力した音声信号認識が全て完成した後、音声認識完成状態の色で表示する。某ワードユニットは認識品質が良好な色であり、某ワードユニットは認識品質が不良な色であり、某ワードユニットは認識品質が劣悪の色である。 When the voice signal input by the user is in the process of identification, the signal waveform 32 of the signal indicator interface 30 displays the signal waveform already processed in different colors, indicating the process of voice recognition. In other words, at the start, the audio signal input by the user is displayed in the recording state color. After the voice recognition process is started, the processed voice signal is displayed in the color of the voice recognition in progress state. After all of the input speech signal recognition is completed, it is displayed in the color of the speech recognition completion state. The 某 word unit is a color with good recognition quality, the 某 word unit is a color with poor recognition quality, and the 某 word unit is a color with poor recognition quality.

図３に示すように、実線の波形３２１は、音声認識過程を完成したものであり、点線の波形３２２は、音声認識過程を完成していないものである。音声認識が完成した後、全ての波形を新たな色へ変更させ、音声認識の処理が完成したことを表示する。詳細は後に記載する。 As shown in FIG. 3, a solid line waveform 321 is a completed speech recognition process, and a dotted line waveform 322 is a completed speech recognition process. After the voice recognition is completed, all the waveforms are changed to a new color to display that the voice recognition process is completed. Details will be described later.

使用者が音声信号を入力し且つ入力された音声信号も認識が完成した後、認識結果は好ましくは、ワードユニット４２０を選択し、逐一上記の文字出力インターフェイス４０上に表示する。図４に示すように、使用者が入力する音声信号が波形３２で表示され、そのうち、該波形３２は、少なくとも１つの波形ユニット３２０を区分し、各波形ユニット３２０は、認識結果中のワードユニット４２０に対応し、両者の対応関係は、前後位置を相互に合わせるように調整する方式で対応する。本実施例中、各波形ユニット３２０は、認識結果のワードユニット４２０に対応する。図４から分かるように、使用者は、「今日の天気は如何」の音声信号を入力し、それが表示する文字結果は「今日の天気は如何」である。波形ユニット３２０に対応する入力音声信号は、認識結果中「今日」のワードユニット４２０であることができる。両者の位置前後は相互に照合され、且つ異なる色でその認識品質を表す。 After the user inputs the voice signal and the input voice signal is also recognized, the recognition result is preferably selected on the word output interface 40 by selecting the word unit 420 one by one. As shown in FIG. 4, a voice signal input by the user is displayed as a waveform 32, of which the waveform 32 divides at least one waveform unit 320, and each waveform unit 320 is a word unit in the recognition result. Corresponding to 420, the correspondence between the two corresponds by a method of adjusting the front and rear positions to match each other. In this embodiment, each waveform unit 320 corresponds to the word unit 420 of the recognition result. As can be seen from FIG. 4, the user inputs a voice signal “how is the weather today”, and the character result displayed by the user is “how is the weather today”. The input speech signal corresponding to the waveform unit 320 may be the “today” word unit 420 in the recognition result. The positions before and after both are collated with each other, and the recognition quality is expressed by different colors.

該音声認識システムが音声理解に用いる時、信号インジケータインターフェイス３０は、依然として図４に示すように、文字出力インターフェイス４０は、音声理解の結果を出力する。また、文字出力インターフェイス４０は音声認識結果の文字４２を含むことができ、または、それをまず隠し、使用者が表示することを選択した後に表示させることもできる。 When the speech recognition system is used for speech understanding, the signal indicator interface 30 still outputs the result of speech understanding, as shown in FIG. In addition, the character output interface 40 may include the speech recognition result character 42, or it may be hidden first and displayed after the user has selected to display.

図４に示すように、音声認識完成後、その信号と文字表示は、「今日」、「天気」、「如何」等のワードユニット４２０を単位とし段落を合わせ、そのうち、各ワードユニットがいずれも色で表示し、該ワードユニットの音声認識結果の品質を表す。本実施例中、各ワードユニット４２０は緑色、黄色または赤色で表示する。そのうち、緑色は、該文字が良好な音声認識品質であることを示す。黄色は、該文字が不良な音声品質を有することを警告する。赤色は、該文字が劣悪な音声品質を有し、検査、更正が必要である音声認識結果を表す。このように、使用者は、直接視覚上、各ワードユニットの音声認識結果品質の優劣を分別し、適切な誤り更正とフィードバック調整システムの処理を行うことができる。 As shown in FIG. 4, after the speech recognition is completed, the signal and character display are arranged in units of word units 420 such as “today”, “weather”, “what”, etc., and each word unit is Displayed in color to indicate the quality of the speech recognition result of the word unit. In this embodiment, each word unit 420 is displayed in green, yellow or red. Of these, green indicates that the character has good speech recognition quality. Yellow warns that the character has poor voice quality. Red indicates a speech recognition result where the character has poor speech quality and needs to be inspected and corrected. In this way, the user can visually distinguish the superiority or inferiority of the speech recognition result quality of each word unit, and can perform appropriate error correction and feedback adjustment system processing.

また、上記各波形ユニット３２０は、フィードバック調整選択項目組と連結する。
該選択項目組は、少なくとも１つの選択項目を含み、使用者に再度録音させ、録音を更正し、音声認識のエラーを更正させるか、音声認識システムをフィードバック調整させる。図５に示すように、各波形ユニット３２０は、第１フィードバック調整選択項目組５０と連結し、該第１フィードバック調整選択項目組５０は、少なくとも１つのフィードバック調整選択項目５２を含む。本実施例中、該第１フィードバック調整選択項目組５０は、「リピート」、「再録」、「受け入れ調整」、「手書き入力に変更」、「キーボード入力に変更」等の第１フィードバック調整選択項目５２を含む。
音声認識完成後、使用者は、該表示装置２０上に表示されるマウスポインタ２２をフィードバック調整したい波形ユニット３２０に移動させ、自動またはマウスまたはタッチパッドでポイントし、該波形ユニット３２０と連結する第１フィードバック調整選択項目組５０を該表示装置２０上に表示する。従って、使用者は、必要な第１フィードバック調整選択項目５２を選択し、音声認識結果を更正、または音声認識システムをフィードバック調整する。 Each waveform unit 320 is connected to a feedback adjustment selection item set.
The selection set includes at least one selection item that causes the user to re-record, correct the recording, correct the speech recognition error, or feed back the speech recognition system. As shown in FIG. 5, each waveform unit 320 is connected to a first feedback adjustment selection item set 50, and the first feedback adjustment selection item set 50 includes at least one feedback adjustment selection item 52. In the present embodiment, the first feedback adjustment selection item set 50 is a first feedback adjustment selection such as “repeat”, “re-record”, “acceptance adjustment”, “change to handwritten input”, “change to keyboard input”, etc. Item 52 is included.
After the speech recognition is completed, the user moves the mouse pointer 22 displayed on the display device 20 to the waveform unit 320 to be feedback-adjusted, points automatically or with a mouse or a touch pad, and connects to the waveform unit 320. One feedback adjustment selection item set 50 is displayed on the display device 20. Accordingly, the user selects the necessary first feedback adjustment selection item 52, corrects the speech recognition result, or performs feedback adjustment of the speech recognition system.

例を挙げれば、使用者は、波形に異常を発見した時、第１フィードバック調整選択項目５２の選択項目「リピート」を選択し、「該音声信号」をリピートし、ノイズ干渉がないかを確定する。または、音声認識の文字結果が程度の大きな偏差を有する時、該「リピート」とした第１フィードバック調整選択項目５２によって入力した音声信号を再度聞き、原因の所在、例えば、発音の偏差を検出する。
問題があることが確定された場合、第１フィードバック調整選択項目５２の選択項目「再録」を選択し、音声信号を再度入力する。音声認識の文字出力結果の偏差が元の発音習慣の問題によるものである場合、第１フィードバック調整選択項目５２の「受け入れ調整」の選択項目を選択し、音声認識システムを鋭く調整し、該使用者の要求に適合させることができる。
該音声認識システムが該言葉を明瞭に認識できるように調整する前、使用者は入力モードの変更を決定し、例えば、「手書き入力に変更」または「キーボード入力に変更等」を選択し、音声入力モードを手書きまたはキーボードモードに切り替え、入力の目的を完成する。 For example, when an abnormality is found in the waveform, the user selects the selection item “repeat” of the first feedback adjustment selection item 52, repeats “the audio signal”, and determines whether there is noise interference. To do. Alternatively, when the character result of the voice recognition has a large deviation, the voice signal input by the first feedback adjustment selection item 52 set as the “repeat” is heard again, and the location of the cause, for example, the deviation of the pronunciation is detected. .
When it is determined that there is a problem, the selection item “re-record” of the first feedback adjustment selection item 52 is selected, and the audio signal is input again. If the deviation of the character output result of speech recognition is due to the problem of the original pronunciation habit, select the “acceptance adjustment” selection item of the first feedback adjustment selection item 52, sharply adjust the speech recognition system, and use the Can be adapted to the requirements of the person.
Before adjusting the speech recognition system so that the words can be clearly recognized, the user decides to change the input mode and selects, for example, “change to handwriting input” or “change to keyboard input” and the like. Switch the input mode to handwriting or keyboard mode and complete the input purpose.

上記文字表示結果中の各ワードユニット４２０も、フィードバック選択項目組と連結する。該選択項目組は、少なくともフィードバック調整選択項目との連結を含み、使用者に選択させ、音声認識のエラーを更正し、または音声認識システムをフィードバック調整する。
図６に示すように、各ワードユニット４２０は、第２フィードバック調整選択項目組６０と連結し、該第２フィードバック調整選択項目組６０は、少なくとも１つのフィードバック選択項目６２を含む。本実施例中、該第２フィードバック調整選択項目組６０は、「次の音声認識選択項目を表示する」、「音声の相似度に従い音声認識選択項目を優先的に表示する」、「全ての選択項目を表示する」、「手書き入力に変更する」、「キーボード入力に変更する」等のフィードバック選択項目６２を含む。音声認識完成後、使用者は、該マウスポインタ２２をフィードバック調整したいワードユニット４２０上に移動させ、自動またはマウス、またはタッチパッドでポイントすることによって、該ワードユニット４２０と連結する第２フィードバック調整選択項目組６０を該表示装置２０上に表示することができる。従って、使用者は、必要なフィードバック調整選択項目６２を選択することによって、音声認識結果に対してフィードバック調整を行うことができる。 Each word unit 420 in the character display result is also connected to the feedback selection item set. The selection item set includes at least a connection with a feedback adjustment selection item to allow the user to select, correct a speech recognition error, or feedback adjust the speech recognition system.
As shown in FIG. 6, each word unit 420 is connected to a second feedback adjustment selection item set 60, and the second feedback adjustment selection item set 60 includes at least one feedback selection item 62. In this embodiment, the second feedback adjustment selection item set 60 includes “display the next voice recognition selection item”, “display the voice recognition selection item preferentially according to the similarity of the voice”, “all selections”. Feedback selection items 62 such as “display item”, “change to handwritten input”, and “change to keyboard input” are included. After the speech recognition is completed, the user moves the mouse pointer 22 onto the word unit 420 to be feedback-adjusted and points to the word unit 420 automatically or by pointing with the mouse or the touch pad. The item set 60 can be displayed on the display device 20. Therefore, the user can perform feedback adjustment on the speech recognition result by selecting the necessary feedback adjustment selection item 62.

発音の問題によって、使用者が入力した音声信号に基づき獲得される音声認識文字結果は大きく変化する可能性がある。図６を例とし、使用者が朗読する音声は、「私はご飯が食べたい」であり、発音習慣の違いによって、認識が得る結果が異なる可能性がある。使用者が入力した音声信号に基づき、本発明の視覚フィードバックシステムは、使用者が該入力した音声信号の複数の音声認識近似結果に対して、選択させる。近似結果は、上記第２フィードバック調整選択項目組６０中の異なるフィードバック選択項目６２を選択し、決定することができる。例を挙げれば、「次の選択肢」の選択項目６２を選択することによって、使用者は、次の選択肢を得ることができる。「音声相似度優先」の選択項目６２を選択することによって、使用者は、前後の言葉の関連に基づき最も可能な選択肢を検出し獲得することができる。または、「全ての選択肢を表示」の選択項目６２を選択することによって、使用者は、全ての音声認識選択肢を表示させることができる。または、使用者は、その他の入力モードを選択することができ、例えば「手書き入力に切り替え」または「キーボード入力に切り替え」を選択し、音声入力モードを手書きまたはキーボード入力モードに切り替え、入力の目的を完成することができる。 Due to the problem of pronunciation, the voice recognition character result obtained based on the voice signal input by the user may change greatly. Using FIG. 6 as an example, the speech read by the user is “I want to eat rice”, and the result of recognition may differ depending on the pronunciation habit. Based on the audio signal input by the user, the visual feedback system of the present invention allows the user to select a plurality of speech recognition approximation results of the input audio signal. The approximation result can be determined by selecting a different feedback selection item 62 in the second feedback adjustment selection item set 60. For example, by selecting the “next option” selection item 62, the user can obtain the next option. By selecting the “voice similarity priority” selection item 62, the user can detect and obtain the most possible option based on the relationship between the preceding and following words. Alternatively, by selecting the selection item 62 “display all options”, the user can display all the voice recognition options. Alternatively, the user can select other input modes, for example, select “Switch to handwriting input” or “Switch to keyboard input”, switch the voice input mode to handwriting or keyboard input mode, and Can be completed.

なお、本発明では好ましい実施例を前述の通り開示したが、これらは決して本発明に限定するものではなく、当該技術を熟知する者なら誰でも、本発明の精神と領域を脱しない均等の範囲内で各種の変動や潤色を加えることができることは勿論である。 In the present invention, the preferred embodiments have been disclosed as described above, but these are not intended to limit the present invention in any way, and anyone who is familiar with the technology can make an equivalent scope without departing from the spirit and scope of the present invention. Of course, various fluctuations and hydration colors can be added.

本発明の音声認識システムの説明図である。It is explanatory drawing of the speech recognition system of this invention. 本発明の音声認識システムの第１実施例の説明図である。It is explanatory drawing of 1st Example of the speech recognition system of this invention. 本発明の音声認識システムの第１実施例のもう１つの説明図である。It is another explanatory drawing of 1st Example of the speech recognition system of this invention. 本発明の音声認識システムの第１実施例のもう１つの説明図である。It is another explanatory drawing of 1st Example of the speech recognition system of this invention. 本発明の音声認識システムの第１実施例の使用状態図である。It is a use state figure of the 1st example of the voice recognition system of the present invention. 本発明の音声認識システムの第１実施例のもう１つの使用状態図である。It is another use condition figure of 1st Example of the speech recognition system of this invention.

Explanation of symbols

１０音声認識エンジン
２０表示装置
２２マウスポインタ
３０信号インジケータインターフェイス
３２，３２１，３２２波形
３２０波形ユニット
４０文字出力インターフェイス
４２文字
４２０ワードユニット
５０第１フィードバック調整選択項目組
５２第１フィードバック調整選択項目
６０第２フィードバック調整選択項目組
６２第２フィードバック調整選択項目 10 voice recognition engine 20 display device 22 mouse pointer 30 signal indicator interface 32, 321, 322 waveform 320 waveform unit 40 character output interface 42 character 420 word unit 50 first feedback adjustment selection item set 52 first feedback adjustment selection item 60 second Feedback adjustment selection item set 62 Second feedback adjustment selection item

Claims

Including at least one speech recognition engine and a display device;
And on the display device,
A signal indicator interface that represents the voice signal input by the user in the waveform and displays the recording status, voice recognition in progress status, voice recognition completion status,
A character output interface that is used to display a voice-recognized character result and the character result includes at least two word units;
A speech recognition system comprising:

2. The voice recognition system according to claim 1, wherein waveforms of a recording state, a voice recognition in progress state, and a voice recognition completion state displayed on the signal indicator interface are displayed in different colors.

2. A speech recognition system according to claim 1, wherein each word unit of the character result of speech recognition on the character output interface represents the speech recognition quality of each word unit with a different color.

Each word unit is displayed in green, yellow, or red, of which green indicates good speech recognition quality, yellow warns that it has poor speech recognition quality, red is poor speech recognition 4. The speech recognition system according to claim 3, wherein the speech recognition system has quality and indicates that strict inspection and correction are required.

Each of the word units is connected to a feedback adjustment selection item set, and the selection item set includes at least one feedback adjustment selection item, which allows the user to select and correct a speech recognition error or feedback a speech recognition system. 4. The speech recognition system according to claim 3, wherein adjustment is performed.

A mouse pointer to be displayed on the display device is moved to a word unit that the user wants to perform feedback adjustment, or is pointed with a touch pad or a mouse, and the feedback adjustment selection item group is displayed on the display device. The voice recognition system according to claim 5.

The feedback selection items included in the feedback adjustment selection item group connected to the word unit are “next option”, “display voice recognition options based on priority of voice similarity”, “display all approximate recognition results”, “handwritten input” The voice recognition system according to claim 5, wherein the voice recognition system is “change to”, “change to keyboard input”, or any combination thereof.

The speech recognition completion waveform on the signal indicator interface further includes at least one waveform unit, and each waveform unit corresponds to and contrasts with one word unit of the speech recognition result displayed by the character output interface. 5. The voice recognition system according to claim 3, wherein the voice recognition quality of the word units is displayed in the same color.

The waveform unit of the signal indicator interface is connected to a feedback adjustment selection set, and the selection set includes at least one feedback adjustment selection, which allows the user to make duplicate recordings, correct recordings, and correct voice recognition errors. 9. The speech recognition system according to claim 8, wherein feedback adjustment is performed on the speech recognition system.

The user moves a mouse pointer on the display device onto a waveform unit for feedback adjustment or points with a touch pad or a mouse to display the feedback selection item set on the display device. Item 10. The speech recognition system according to Item 9.

The feedback adjustment selection item included in the feedback adjustment selection item set includes “play”, “re-record”, “acceptance adjustment”, “change to handwritten input”, “change to keyboard input”, or any combination of the above. The speech recognition system according to claim 9.

Whether the system has a display device or can be connected to other display devices, a desktop personal computer, a notebook personal computer, a home multimedia system, a television, a DVD, an AV system, a portable or a PDA having a display device on a remote control The speech recognition system according to claim 1, wherein:

The speech recognition system according to claim 9, wherein the word unit is a word, a classifier, or an idiom.