JP6468584B2

JP6468584B2 - Foreign language difficulty determination device

Info

Publication number: JP6468584B2
Application number: JP2014166158A
Authority: JP
Inventors: 弘信岡崎; 貫治渡邉; 敬介稲川; 晴彦新田; 和彦木戸; 衣里福田
Original assignee: 弘信岡崎; 衣里福田
Priority date: 2014-08-18
Filing date: 2014-08-18
Publication date: 2019-02-13
Anticipated expiration: 2034-08-18
Also published as: JP2016042158A

Description

本発明は外国語の難易度判定装置に係り、とくに外国語文のリスニング難易度の判定を行う外国語の難易度判定装置に関する。 The present invention relates to a foreign language difficulty level determination device, and more particularly to a foreign language difficulty level determination device that determines a listening difficulty level of a foreign language sentence.

インターネットの普及により、ニュース、映画などの種々のジャンルの外国語のテキスト付音声データが容易に入手できるようになり、これらのテキスト付音声データを学習教材に用いてリスニング等の外国語学習を可能とした外国語学習システムが実用化されている。外国語学習では、学習者の語学力に見合った難易度の学習対象を選択するのが重要である。
学習者の語学力は、語彙テスト、リスニングテスト、リーディングテストなどで測定される。一方、学習対象の外国語文の難易度は、例えば予め用意された単語別の語彙レベル表を用いて、外国語文の全単語の語彙レベルの総和／総単語数を求めて難易度の指標としている。 With the spread of the Internet, it becomes possible to easily obtain voice data with text in foreign languages of various genres such as news, movies, etc., and learning foreign languages such as listening using these voice data with text as learning materials The foreign language learning system is put into practical use. In foreign language learning, it is important to select a learning target with a difficulty level that matches the language ability of the learner.
The learner's language skills are measured by vocabulary tests, listening tests, reading tests, etc. On the other hand, the difficulty level of the foreign language sentence to be studied is used as an index of the difficulty level, for example, by using a vocabulary level table prepared in advance for each word to obtain the sum of the vocabulary levels of all words in the foreign language sentence / total number of words. .

ところで、外国語のリスニングの場合、同じ単語であっても単語の含まれる文が変わると発音が弱変化したり、音の一部が消失したりして、発音の聞き取り易さが変わることが知られている。従って、従来のテキストベースの難易度決定法では、リスニング学習の難易度が正確には判らず、学習者が自身の語学力に見合った学習対象を選択するのが難しいという問題があった。 By the way, in the case of listening in a foreign language, even if it is the same word, if the sentence containing the word changes, the pronunciation may change weakly or a part of the sound may disappear, which may change the ease of pronunciation. Are known. Therefore, the conventional text-based difficulty determination method has a problem in that the difficulty of listening learning is not accurately determined, and it is difficult for the learner to select a learning target that matches his / her language ability.

特開２００４−３３４６９９号公報JP 2004-334699 A

本発明は上記した従来技術の問題に鑑みなされたもので、外国語文のリスニング難易度を正しく判定できる外国語の難易度判定装置を提供することを、その目的とする。 The present invention has been made in view of the above-described problems of the prior art, and an object thereof is to provide a foreign language difficulty level determination device that can correctly determine the listening difficulty level of a foreign language sentence.

請求項１記載の発明では、外国語文の発話音声を入力し、特徴抽出する音声分析手段と、音響モデルを含む音声認識用の情報を記憶した音声認識用データベース記憶手段と、音声分析手段で抽出された音声の特徴情報に基づき音声認識用データベース記憶手段を参照して、単語単位で認識候補を選択し、各認識候補の認識スコアを計算する認識候補選択・認識スコア計算手段と、認識スコアの最も大きい認識候補を認識単語として決定する認識単語決定手段と、各認識単語の認識スコアを用いて外国語文のリスニング難易度を判定するリスニング難易度判定手段と、を備え、リスニング難易度判定手段は、認識スコアを複数段階の聞き取りづらさレベルに変換する聞き取りづらさ変換テーブル記憶手段を備え、各認識単語の認識スコアを聞き取りづらさレベルに変換してリスニング難易度を求めるようにしたこと、を特徴としている。
請求項２記載の発明では、外国語文の発話音声を入力し、特徴抽出する音声分析手段と、音響モデルを含む音声認識用の情報を記憶した音声認識用データベース記憶手段と、音声分析手段で抽出された音声の特徴情報に基づき音声認識用データベース記憶手段を参照して、単語単位で認識候補を選択し、各認識候補の認識スコアを計算する認識候補選択・認識スコア計算手段と、認識スコアの最も大きい認識候補を認識単語として決定する認識単語決定手段と、各認識単語の認識スコアを用いて、外国語文のリスニング難易度を判定するリスニング難易度判定手段と、を備え、リスニング難易度判定手段は、認識スコアを複数段階の聞き取りづらさレベルに変換する変換テーブルを記憶した聞き取りづらさ変換テーブル記憶手段と、単語別の語彙レベルテーブルを記憶した語彙レベルテーブル記憶手段とを備え、各認識単語の認識スコアから変換した聞き取りづらさレベルと語彙レベルを組み合わせてリスニング難易度を求めるようにしたこと、を特徴としている。
請求項３記載の発明では、外国語文のテキストデータを記憶したテキストデータ記憶手段を有し、認識候補選択・認識スコア計算手段はテキストデータ記憶手段に記憶された単語列の各単語を認識候補とすること、を特徴としている。
請求項４記載の発明では、外国語文の発話音声を入力し、特徴抽出する音声分析手段と、音響モデルを含む音声認識用の情報を記憶した音声認識用データベース記憶手段と、音声分析手段で抽出された音声の特徴情報に基づき音声認識用データベース記憶手段を参照して、単語単位で認識候補を選択し、各認識候補の認識スコアを計算する認識候補選択・認識スコア計算手段と、認識スコアの最も大きい認識候補を認識単語として決定する認識単語決定手段と、単語別に、認識スコアを聞き取りづらさレベルに変換するための変換テーブルを記憶した聞き取りづらさ変換テーブル記憶手段と、単語別の変換テーブルを参照して各認識単語の認識スコアから変換した聞き取りづらさレベルを用いてリスニング難易度を求めるリスニング難易度判定手段とを備えたこと、を特徴としている。
請求項５記載の発明では、リスニング難易度判定手段は、単語別の語彙レベルテーブルを記憶した語彙レベルテーブル記憶手段を備え、各認識単語の認識スコアから変換した聞き取りづらさレベルと語彙レベルを組み合わせてリスニング難易度を求めるようにしたこと、を特徴としている。
According to the first aspect of the present invention, speech analysis means for inputting and extracting features of speech spoken in a foreign language sentence, database recognition means for speech recognition storing information for speech recognition including an acoustic model, and extraction by speech analysis means A recognition candidate selection / recognition score calculating means for selecting a recognition candidate for each word and calculating a recognition score for each recognition candidate, referring to the speech recognition database storage means based on the feature information of the recognized speech; a recognition word determining means for determining the largest recognition candidate as a recognition word, and listening difficulty determination means for determining a listening difficulty of foreign language text using a recognition score of each recognized word comprises, listening difficulty determination means And a hard-to-hear conversion table storage means for converting the recognition score into a multi-step hard-to-hear level, and listens to the recognition score of each recognition word. It has to obtain the listening difficulty is converted to Rizura level is characterized.
In the invention 請 Motomeko 2, enter the speech foreign language text, a speech analysis means for feature extraction, and the speech recognition database storage means for storing information for speech recognition comprising an acoustic model, the voice analysis means A recognition candidate selection / recognition score calculating means for referring to the speech recognition database storage means based on the extracted voice feature information, selecting a recognition candidate in units of words, and calculating a recognition score of each recognition candidate; and a recognition score A recognition word determination means for determining the recognition candidate having the largest recognition word as a recognition word, and a listening difficulty determination means for determining the listening difficulty level of the foreign language sentence using the recognition score of each recognition word. The means includes a hard-to-hear conversion table storing means for storing a conversion table for converting the recognition score into a multi-step hard-to-hear level, And a lexical level table storage means for storing the vocabulary level table, it has to obtain the listening difficulty by combining the converted hearing Zura levels and vocabulary level from the recognition score of each recognized word is characterized.
According to a third aspect of the present invention, there is provided text data storage means for storing text data of a foreign language sentence, and the recognition candidate selection / recognition score calculation means regards each word in the word string stored in the text data storage means as a recognition candidate. It is characterized by.
According to a fourth aspect of the present invention, speech analysis means for inputting utterance speech of a foreign language sentence and extracting features, speech recognition database storage means for storing information for speech recognition including an acoustic model, and extraction by speech analysis means A recognition candidate selection / recognition score calculating means for selecting a recognition candidate for each word and calculating a recognition score for each recognition candidate, referring to the speech recognition database storage means based on the feature information of the recognized speech; A recognition word determination means for determining the largest recognition candidate as a recognition word, a difficulty-of-hearing conversion table storage means for storing a conversion table for converting a recognition score into a difficulty level of hearing for each word, and a conversion table for each word Listening Difficulty Level to obtain listening difficulty level using difficulty level of hearing converted from recognition score of each recognition word Further comprising a means is characterized.
According to a fifth aspect of the present invention, the listening difficulty level determination means includes a vocabulary level table storage means for storing a vocabulary level table for each word, and combines the difficulty of hearing and the vocabulary level converted from the recognition score of each recognized word. And listening difficulty level.

本発明によれば、音声認識を利用して外国語の聞き取りづらさに関連する情報を測定することにより、リスニング難易度を正しく判定可能となる。 According to the present invention, it is possible to correctly determine the listening difficulty level by measuring information related to difficulty in hearing a foreign language using speech recognition.

本発明の第１実施例に係る英語のリスニング難易度判定装置の構成図である（実施例１）。BRIEF DESCRIPTION OF THE DRAWINGS It is a block diagram of the English listening difficulty level determination apparatus based on 1st Example of this invention (Example 1). 図１中の聞き取りづらさ変換テーブル記憶部の記憶内容の説明図である。It is explanatory drawing of the memory content of the difficulty to hear conversion table memory | storage part in FIG. 図１中の認識結果記憶部の記憶内容の一例を示す説明図である。It is explanatory drawing which shows an example of the memory content of the recognition result memory | storage part in FIG. 本発明の第２実施例に係る英語のリスニング難易度判定装置の構成図である（実施例２）。It is a block diagram of the English listening difficulty determination apparatus which concerns on 2nd Example of this invention (Example 2). 図４中の聞き取りづらさ変換テーブル記憶部に記憶された変換テーブルの作成方法の説明図である。It is explanatory drawing of the preparation method of the conversion table memorize | stored in the difficulty of hearing conversion table memory | storage part in FIG. 図４中の聞き取りづらさ変換テーブル記憶部に記憶された変換テーブルの作成方法の説明図である。It is explanatory drawing of the preparation method of the conversion table memorize | stored in the difficulty of hearing conversion table memory | storage part in FIG.

以下、本発明の最良の形態を実施例に基づき説明する。 Hereinafter, the best mode of the present invention will be described based on examples.

図１は本発明の第１実施例に係る英語のリスニング難易度判定装置の構成を示すブロック図である。
この装置は、音声認識技術を用いて英語の発音の聞きとりづらさと相関の有る情報を得て英語のリスニング難易度の判定を行うようにしたものである。
図１中、１はインターネット経由でＷｅｂサイトから入手したり、ＤＶＤ等のメディアから読み出したニュース、映画等の任意の英語テキスト付音声データを記憶した外国語文データ記憶部、２は外国語文データ記憶部１から図示しない読み出し手段により読み出された英語の音声データを入力して音声分析を行い音声の特徴抽出（例えばＬＰＣケプストラム）を行う音声分析部、３は英語の音声認識に用いる単語別の標準音響モデル含む音声認識用データベースを記憶した音声認識用データベース記憶部、４は音声分析部２で抽出された音声の特徴情報に基づき音声認識用データベースを参照しながら単語単位で複数の認識候補の単語（認識仮説と呼ばれる）を選択し、各認識候補単語と標準音響モデルとの類似度を示す認識スコア（０〜１００までの数値をとる音響スコア。数値が大きい程、正しさが高い）をＤＰマッチング等の手法で計算する認識候補選択・認識スコア計算部、５は単語単位で認識候補の単語中の最も認識スコアの高い単語を認識単語として決定し認識スコアと組にして出力する認識単語決定部、６は認識単語決定部５から出力される（認識単語Ｗｉ、認識スコアＲＳｉ）の組データ列を一時記憶する認識結果記憶部、７は単語別の語彙レベルを記憶した語彙レベルテーブル記憶部、８は認識スコアを複数の段階に分けた聞き取りづらさレベルに変換するための変換テーブルを記憶した聞き取りづらさ変換テーブル記憶部であり、例えば図２に示す如く変換テーブルが記憶されている。図２において、聞き取りづらさレベルは数値が大きい程、聞き取りづらい。９は認識結果記憶部６に記憶された認識結果に基づき、語彙レベルテーブル記憶部７、聞き取りづらさ変換テーブル記憶部８などを参照してリスニング難易度を判定する判定部であり、認識スコア（聞き取りづらさレベル）に基づく第１のリスニング難易度ＬＮ１と、認識スコア（聞き取りづらさレベル）と語彙レベルの組み合わせに基づく第２のリスニング難易度ＬＮ２を求める。第１、第２のリスニング難易度ＬＮ１、ＬＮ２については後述する。１０は表示部、１１は表示部１０に第１、第２のリスニング難易度ＬＮ１、ＬＮ２、認識結果の英語テキストなどを表示させる表示処理部である。 FIG. 1 is a block diagram showing a configuration of an English listening difficulty level determination apparatus according to a first embodiment of the present invention.
This apparatus uses speech recognition technology to obtain information that correlates with difficulty in hearing English pronunciation, and determines the English listening difficulty level.
In FIG. 1, reference numeral 1 is a foreign language sentence data storage unit that stores audio data with arbitrary English text such as news and movies obtained from a website via the Internet or read from a medium such as a DVD, and 2 is foreign language sentence data storage. A speech analysis unit for inputting speech data in English read by the reading means (not shown) from the unit 1 and performing speech analysis to extract speech features (for example, LPC cepstrum), 3 for each word used for speech recognition in English A speech recognition database storage unit 4 which stores a speech recognition database including a standard acoustic model, and a reference numeral 4 indicates a plurality of recognition candidates in units of words while referring to the speech recognition database based on the speech feature information extracted by the speech analysis unit 2. A word (called a recognition hypothesis) is selected, and a recognition score (0 to 0) indicating the similarity between each recognition candidate word and the standard acoustic model A recognition candidate selection / recognition score calculation unit that calculates an acoustic score that takes a numerical value up to 00. The higher the numerical value is, the higher the correctness is) by a method such as DP matching. A recognition word determination unit that determines a word having a high score as a recognition word and outputs it as a combination with a recognition score, and 6 temporarily stores a combination data string output from the recognition word determination unit 5 (recognition word Wi, recognition score RSi). A recognition result storage unit, 7 is a vocabulary level table storage unit that stores vocabulary levels for each word, and 8 is a difficulty in hearing that stores a conversion table for converting the recognition score into a difficulty level of hearing divided into a plurality of stages. A conversion table storage unit stores a conversion table, for example, as shown in FIG. In FIG. 2, the harder to hear, the harder it is to hear, the larger the numerical value. Reference numeral 9 denotes a determination unit that determines the listening difficulty level based on the recognition result stored in the recognition result storage unit 6 with reference to the vocabulary level table storage unit 7, the hard-to-hear conversion table storage unit 8, and the like. A first listening difficulty level LN1 based on the difficulty level of listening) and a second listening difficulty level LN2 based on a combination of the recognition score (difficulty level of listening) and the vocabulary level are obtained. The first and second listening difficulty levels LN1 and LN2 will be described later. Reference numeral 10 denotes a display unit, and 11 denotes a display processing unit that displays first and second listening difficulty levels LN1 and LN2, English text of recognition results, and the like on the display unit 10.

次に上記した実施例の動作を説明する。
なお、外国語文データ記憶部１には英語の任意のテキスト付音声データが記憶済みであるとする。音声分析部２は外国語文データ記憶部１から図示しない読み出し手段により読み出された音声データを入力し、特徴抽出を行い特徴情報を認識候補選択・認識スコア計算部４へ出力する。認識候補選択・認識スコア計算部４は、特徴情報に基づき音声認識用データベース記憶部３を参照して単語単位で特徴情報に類似する認識候補の単語を選択し、認識スコアを計算する。ここでは、文頭の単語として認識候補の単語がｗ１１、ｗ１２の二つ見つかり、各々認識スコアがｒｓ１１、ｒｓ１２であったとする。認識単語決定部５は文頭の単語の認識候補の内、一番大きな認識スコアの単語を認識単語Ｗ１として決定し、認識結果記憶部６に最初の認識結果として( 認識単語Ｗ１，認識スコアＲＳ１) の組データを記憶させる。認識候補選択・認識スコア計算部４は、最初の認識結果に続く部分の音声特徴情報に基づき音声認識用データベース記憶部３を参照して２番目の単語の認識候補を選択し、認識スコアを計算する。認識単語決定部５は２番目の単語の認識候補の内、一番大きな認識スコアの単語を認識単語として決定し、認識結果記憶部６に２番目の認識結果として( 認識単語Ｗ２，認識スコアＲＳ２) の組データを追加記憶させる。以下、同様の処理を音声データの最後まで繰り返す。この結果、認識結果記憶部６の記憶内容が図３の如くなったものとする。 Next, the operation of the above embodiment will be described.
In the foreign language sentence data storage unit 1, it is assumed that voice data with any text in English is already stored. The speech analysis unit 2 inputs speech data read from the foreign language sentence data storage unit 1 by a reading unit (not shown), performs feature extraction, and outputs feature information to the recognition candidate selection / recognition score calculation unit 4. The recognition candidate selection / recognition score calculation unit 4 refers to the speech recognition database storage unit 3 based on the feature information, selects recognition candidate words similar to the feature information in units of words, and calculates a recognition score. Here, it is assumed that two recognition candidate words w11 and w12 are found as words at the beginning of the sentence, and the recognition scores are rs11 and rs12, respectively. The recognition word determination unit 5 determines the word with the highest recognition score among the recognition candidates of the word at the beginning of the sentence as the recognition word W1, and stores the recognition result in the recognition result storage unit 6 as the first recognition result (recognition word W1, recognition score RS1). The set data is stored. The recognition candidate selection / recognition score calculation unit 4 selects a recognition candidate for the second word with reference to the speech recognition database storage unit 3 based on the speech feature information of the part following the first recognition result, and calculates the recognition score. To do. The recognition word determination unit 5 determines the word with the highest recognition score among the recognition candidates for the second word as the recognition word, and stores the recognition result in the recognition result storage unit 6 as a recognition result (recognition word W2, recognition score RS2). ) Additional data is stored. Thereafter, the same processing is repeated until the end of the audio data. As a result, it is assumed that the contents stored in the recognition result storage unit 6 are as shown in FIG.

次に、リスニング難易度判定部９は、認識結果記憶部６に記憶された各認識単語と対をなす認識スコアを聞き取りづらさレベル変換テーブルを用いて聞き取りづらさレベルに変換し、各単語の聞き取りづらさレベルの総和／総語数の計算により第１のリスニング難易度ＬＮ１（認識スコアによる難易度）を求める。また、各単語の語彙レベルの総和／総語数の計算をして従来と同様のテキストベースの難易度ＴＮを求め、（ａ・ＬＮ１＋ｂ・ＴＮ）の重み付け加算の計算を行って、第２のリスニング難易度ＬＮ２（認識スコアと語彙レベルを組み合わせた難易度）を求める。ただし、ａ、ｂは、ａ＋ｂ＝１の関係を有する固定の重み付け係数である。 Next, the listening difficulty level determination unit 9 converts the recognition score that is paired with each recognized word stored in the recognition result storage unit 6 into a level of difficulty in hearing using the difficulty level of hearing level conversion table. The first listening difficulty LN1 (difficulty based on recognition score) is obtained by calculating the sum of the difficulty level of hearing / the total number of words. Also, the sum of the vocabulary levels of each word / the total number of words is calculated to obtain a text-based difficulty level TN similar to the conventional one, and the weighted addition of (a · LN1 + b · TN) is calculated to obtain the second listening The difficulty level LN2 (the difficulty level combining the recognition score and the vocabulary level) is obtained. However, a and b are fixed weighting coefficients having a relationship of a + b = 1.

表示処理部１１はリスニング難易度判定部７で求められた第１、第２のリスニング難易度ＬＮ１、ＬＮ２を表示部１０に表示させ、また認識結果の単語列を表示させる。 The display processing unit 11 displays the first and second listening difficulty levels LN1 and LN2 obtained by the listening difficulty level determination unit 7 on the display unit 10 and displays a word string of the recognition result.

この実施例によれば、音声データに対し音声認識処理を行って聞き取りづらさと相関関係を持つ識別スコアを測定し、認識スコアから変換した聞き取りづらさレベルを用いて、英語文のリスニング難易度を判定するようにしたので、文によって単語の発音が変わる場合でも、正確なリスニング難易度が判る。すなわち、第１のリスニング難易度ＬＮ１によれば、音の聞き取りづらさから見た外国語文の難易度が判り、従来のテキストベースの難易度と比較して、より正確なリスニング難易度が判る。また、第２のリスニング難易度Ｎ２によれば、音の聞き取りづらさと、意味のわかりにくさの両方から見た外国語文の難易度が判り、これによっても、従来のテキストベースだけの難易度と比較して、より正確なリスニング難易度が判る。 According to this embodiment, speech recognition processing is performed on speech data, an identification score having a correlation with difficulty of hearing is measured, and the listening difficulty level of the English sentence is determined by using the difficulty of hearing converted from the recognition score. Since it is determined, even if the pronunciation of the word changes depending on the sentence, the exact listening difficulty level can be determined. That is, according to the first listening difficulty level LN1, the difficulty level of the foreign language sentence viewed from the difficulty of hearing the sound is known, and the listening difficulty level is more accurate as compared with the conventional text-based difficulty level. Also, according to the second listening difficulty N2, it is possible to understand the difficulty level of a foreign language sentence from the viewpoint of both difficulty in hearing sounds and difficulty in understanding meanings. Compared to this, the more accurate listening difficulty can be understood.

なお、上記した実施例では、認識候補選択・認識スコア計算部は、音声特徴情報に基づき音響モデルを参照して次の認識単語を得るために１または複数の認識候補を選択し、認識スコアの最も高い候補を認識単語として決定するようにしたが、外国語文データ記憶部１に予め正しいテキストデータが記憶されていることから、このテキストデータを用いて１つだけ次の認識候補を決定し、該認識候補に対する認識スコアを音響モデルを参照して計算するようにしても良い。
また、音声認識用データベース記憶部に、音響モデルに加えて言語モデルを記憶しておき、認識候補選択・認識スコア計算部は、音響モデルと言語モデルを参照して、１または複数の認識候補の単語を選択するとともに、各認識候補の単語について音響モデルから音響スコアを計算し、言語モデルから言語スコアを計算し、これら音響スコアと言語スコアを加重平均するなどして総スコアを計算し、この総合スコアを認識スコアとするようにしても良い。
また、認識スコアを７段階の聞き取りづらさレベルに変換するようにしたが、６段階以下や８段階以上の段階に分けるようにしてもよい。
また、各認識結果の単語の認識スコアの総和／総単語数の計算を行って、リスニング難易度を求めるようにしても良い。 In the embodiment described above, the recognition candidate selection / recognition score calculation unit selects one or a plurality of recognition candidates to obtain the next recognition word by referring to the acoustic model based on the voice feature information, and Although the highest candidate is determined as a recognition word, since correct text data is stored in advance in the foreign language sentence data storage unit 1, only one next recognition candidate is determined using this text data, You may make it calculate the recognition score with respect to this recognition candidate with reference to an acoustic model.
The speech recognition database storage unit stores a language model in addition to the acoustic model, and the recognition candidate selection / recognition score calculation unit refers to the acoustic model and the language model to determine one or more recognition candidates. Select a word, calculate the acoustic score from the acoustic model for each recognition candidate word, calculate the language score from the language model, and calculate the total score by weighted averaging of these acoustic scores and language scores, etc. The total score may be used as the recognition score.
Moreover, although the recognition score is converted into the seven levels of difficulty in hearing, the recognition score may be divided into six stages or less or eight stages or more.
Also, the listening difficulty level may be obtained by calculating the sum of the recognition scores of the words of the respective recognition results / the total number of words.

図４は本発明の第２実施例に係る英語のリスニング難易度判定装置の構成を示すブロック図であり、図１と同一の構成部分には同一の符号が付してある。
上記した第１実施例では、識別スコアから聞き取りづらさレベルへの変換を、どの単語についても同じ変換テーブルを用いて一律に行ったのに対し、第２実施例では単語毎に異なる変換テーブルを用いるようにしたものである。一般的に音声認識による認識スコアの大小は聞き取りづらさと相関があると考えられるが、本願発明者が音声認識とリスニングテストの比較研究を行った結果、単語によって相関度に違いがあることが判った。これは、学習者がリスニングを行う場合に、音だけではなく、文法など音以外の要素も手掛かりにしてリスニングをしているからであり、識別スコアが低くても音以外の要素で聴き取りが成功し易い単語が存在したり、逆に、識別スコアが高くても文法など他の要素の手がかりが乏しく、聴き取りが成功しにくい単語が存在したりするからである。 FIG. 4 is a block diagram showing the configuration of the English listening difficulty level determination apparatus according to the second embodiment of the present invention. The same components as those in FIG. 1 are denoted by the same reference numerals.
In the first embodiment described above, the conversion from the identification score to the level of difficulty in hearing is performed uniformly using the same conversion table for every word, whereas in the second embodiment, a different conversion table is used for each word. It is intended to be used. In general, the recognition score by speech recognition is thought to correlate with difficulty in hearing. It was. This is because when a learner listens, he listens not only to sounds but also to elements other than sound such as grammar. Even if the identification score is low, listening is possible with elements other than sound. This is because there are words that are easy to succeed, or conversely, even if the identification score is high, there are few clues to other elements such as grammar, and there are words that are difficult to listen to.

そこで、多種類の単語Ｗ１、Ｗ２、Ｗ３、Ｗ４、Ｗ５、・・・の各々について、単語Ｗｉの登場する文が種々異なるリスニングテスト（音声データ）ＬＴ１（Ｗｉ）、ＬＴ２（Ｗｉ）、ＬＴ３（Ｗｉ）、ＬＴ４（Ｗｉ）、・・・を用意し、多数の被験者を対象に単語Ｗｉのリスニングテストを行い、リスニングテストＬＴ１（Ｗｉ）、ＬＴ２（Ｗｉ）、ＬＴ３（Ｗｉ）、ＬＴ４（Ｗｉ）、・・・毎の誤答率Ｆ−ＬＴ１（Ｗｉ）、Ｆ−ＬＴ２（Ｗｉ）、Ｆ−ＬＴ３（Ｗｉ）、Ｆ−ＬＴ４（Ｗｉ）、・・・を調べておく。また、リスニングテスト（音声データ）ＬＴ１（Ｗｉ）、ＬＴ２（Ｗｉ）、ＬＴ３（Ｗｉ）、ＬＴ４（Ｗｉ）、・・・毎に、音声データを第１実施例の装置の音声認識部に入力して単語Ｗｉの識別スコアＲＳ−ＬＴ１（Ｗｉ）、ＲＳ−ＬＴ２（Ｗｉ）、ＲＳ−ＬＴ３（Ｗｉ）、ＲＳ−ＬＴ４（Ｗｉ）、・・・を測定しておく（図５参照）。 Therefore, for each of the various types of words W1, W2, W3, W4, W5,..., Listening tests (speech data) LT1 (Wi), LT2 (Wi), LT3 (where the sentence in which the word Wi appears are different. Wi), LT4 (Wi),... Are prepared, a test for listening to the word Wi is performed on a large number of subjects, and the listening tests LT1 (Wi), LT2 (Wi), LT3 (Wi), LT4 (Wi) ,... The error rate F-LT1 (Wi), F-LT2 (Wi), F-LT3 (Wi), F-LT4 (Wi),. Also, for each listening test (voice data) LT1 (Wi), LT2 (Wi), LT3 (Wi), LT4 (Wi),..., The voice data is input to the voice recognition unit of the apparatus of the first embodiment. Then, identification scores RS-LT1 (Wi), RS-LT2 (Wi), RS-LT3 (Wi), RS-LT4 (Wi),... Of the word Wi are measured (see FIG. 5).

そして、横軸を識別スコア、縦軸を誤答率にして単語Ｗｉについての識別スコアと誤答率の関係を示すグラフを図６（１）の如く描く。誤答率は聞き取りづらさを表すので、例えば６段階の聞き取りづらさレベルに変換したい場合、図６（２）の如く変換テーブルを作成し、単語Ｗｉと対応付けて聞き取りづらさ変換テーブル記憶部に記憶させる。Ｗｉ以外の他の単語Ｗ１、Ｗ２、Ｗ３、Ｗ４、・・についても同様のリスニングテストと音声認識による識別スコアの測定を行い、図６と同様の変換テーブルを作成し、単語に対応付けて聞き取りづらさ変換テーブル記憶部に記憶させておく。この実施例では図６（２）の変換テーブルが記憶されているものとする。 Then, a graph showing the relationship between the identification score and the error rate for the word Wi with the horizontal axis as the identification score and the vertical axis as the error rate is drawn as shown in FIG. Since the wrong answer rate represents difficulty in hearing, for example, when it is desired to convert it to six levels of difficulty in hearing, a conversion table is created as shown in FIG. Remember me. For other words W1, W2, W3, W4,... Other than Wi, the same listening test and measurement of the identification score by voice recognition are performed, and a conversion table similar to FIG. It is stored in the roughness conversion table storage unit. In this embodiment, it is assumed that the conversion table of FIG. 6B is stored.

図４中、１はインターネット経由でＷｅｂサイトから入手したり、ＤＶＤ等のメディアから読み出したニュース、映画等の任意の英語テキスト付音声データを記憶した外国語文データ記憶部、２は外国語文データ記憶部１から読み出した音声データを入力して音声分析を行い音声の特徴抽出（例えばＬＰＣケプストラム）を行う音声分析部、３は英語の音声認識に用いる単語別の標準音響モデル含む音声認識用データベースを記憶した音声認識用データベース記憶部、４は音声分析部２で抽出された音声の特徴情報に基づき音声認識用データベースを参照しながら単語単位で複数の認識候補の単語（認識仮説と呼ばれる）を選択し、各認識候補単語と標準音響モデルとの類似度を示す認識スコア（０〜１００までの数値をとる音響スコア。数値が大きい程、正しさが高い）をＤＰマッチング等の手法で計算する認識候補選択・認識スコア計算部、５は単語単位で認識候補の単語中の最も認識スコアの高い単語を認識単語として決定し認識スコアと組にして出力する認識単語決定部、６は認識単語決定部５から出力される（認識単語Ｗｉ、認識スコアＲＳｉ）の組データ列を一時記憶する認識結果記憶部、７は単語別の語彙レベルを記憶した語彙レベルテーブル記憶部、８Ａは単語別に認識スコアを複数の段階に分けた聞き取りづらさレベルに変換するための変換テーブルを記憶した聞き取りづらさ変換テーブル記憶部であり、各単語について、図５で説明した如き変換テーブルが記憶されている。９Ａは認識結果記憶部６に記憶された認識結果に基づき、語彙レベルテーブル記憶部７、聞き取りづらさ変換テーブル記憶部８Ａなどを参照して識別スコア（聞き取りづらさレベル）に基づく第１のリスニング難易度ＬＮ１’と、識別スコア（聞き取りづらさレベル）と語彙レベルの組み合わせに基づく第２のリスニング難易度ＬＮ２’を求める。第１、第２のリスニング難易度ＬＮ１’、ＬＮ２’については後述する。１０は表示部、１１は表示部１０に第１、第２のリスニング難易度ＬＮ１’、ＬＮ２’、認識結果の英語テキストなどを表示させる表示処理部である。 In FIG. 4, reference numeral 1 is a foreign language sentence data storage unit that stores audio data with arbitrary English text such as news and movies obtained from a website via the Internet or read from a medium such as a DVD, and 2 is foreign language sentence data storage. A speech analysis unit that inputs speech data read from the unit 1 and performs speech analysis to perform speech feature extraction (for example, LPC cepstrum), 3 is a speech recognition database including a standard acoustic model for each word used for English speech recognition. The stored speech recognition database storage unit 4 selects a plurality of recognition candidate words (referred to as recognition hypotheses) in units of words while referring to the speech recognition database based on the speech feature information extracted by the speech analysis unit 2 A recognition score indicating the degree of similarity between each recognition candidate word and the standard acoustic model (an acoustic score taking a numerical value from 0 to 100. The recognition candidate selection / recognition score calculation unit 5 calculates a word with the highest recognition score among the recognition candidate words on a word basis as a recognition word. A recognition word determination unit that outputs a set together with a recognition score, 6 is a recognition result storage unit that temporarily stores a set data string output from the recognition word determination unit 5 (recognition word Wi, recognition score RSi), and 7 is a word A vocabulary level table storage unit that stores the vocabulary levels of the above, 8A is a hard to hear conversion table storage unit that stores a conversion table for converting the recognition score into a difficulty level of hearing divided into a plurality of stages for each word, The conversion table as described with reference to FIG. 5 is stored for the word. 9A is based on the recognition result stored in the recognition result storage unit 6 and refers to the vocabulary level table storage unit 7, the hard-to-hear conversion table storage unit 8A, etc., for the first listening based on the identification score (hardness of hearing) The second listening difficulty LN2 ′ based on the combination of the difficulty LN1 ′, the identification score (difficulty level of hearing) and the vocabulary level is obtained. The first and second listening difficulty levels LN1 'and LN2' will be described later. Reference numeral 10 denotes a display unit, and 11 denotes a display processing unit that displays first and second listening difficulty levels LN1 'and LN2', English text of recognition results, and the like on the display unit 10.

次に上記した実施例の動作を説明する。
なお、外国語文データ記憶部１には英語の任意のテキスト付音声データが記憶済みであるとする。
第１実施例と同様に、音声分析部２は音声データを入力し、特徴抽出を行い特徴情報を認識候補選択・認識スコア計算部４へ出力する。認識候補選択・認識スコア計算部４は、特徴情報に基づき音声認識用データベース記憶部３を参照して単語単位で特徴情報に類似する認識候補の単語を選択し、認識スコアを計算する。ここでは、文頭の単語として認識候補の単語がｗ１１、ｗ１２の二つ見つかり、各々識別スコアがｒｓ１１、ｒｓ１２であったとする。認識単語決定部５は文頭の単語の認識候補の内、一番大きな認識スコアの単語を認識単語Ｗ１として決定し、認識結果記憶部６に最初の認識結果として( 認識単語Ｗ１，認識スコアＲＳ１) の組データを記憶させる。認識候補選択・認識スコア計算部４は、最初の認識結果に続く部分の音声特徴情報に基づき音声認識用データベース記憶部３を参照して２番目の単語の認識候補を選択し、認識スコアを計算する。認識単語決定部５は２番目の単語の認識候補の内、一番大きな認識スコアの単語を認識単語として決定し、認識結果記憶部６に２番目の認識結果として( 認識単語Ｗ２，認識スコアＲＳ２) の組データを追加記憶させる。以下、同様の処理を音声データの最後まで繰り返す。この結果、認識結果記憶部６の記憶内容が図３の如くなったものとする。 Next, the operation of the above embodiment will be described.
In the foreign language sentence data storage unit 1, it is assumed that voice data with any text in English is already stored.
As in the first embodiment, the voice analysis unit 2 inputs voice data, performs feature extraction, and outputs the feature information to the recognition candidate selection / recognition score calculation unit 4. The recognition candidate selection / recognition score calculation unit 4 refers to the speech recognition database storage unit 3 based on the feature information, selects recognition candidate words similar to the feature information in units of words, and calculates a recognition score. Here, it is assumed that two recognition candidate words w11 and w12 are found as words at the beginning of the sentence, and the identification scores are rs11 and rs12, respectively. The recognition word determination unit 5 determines the word with the highest recognition score among the recognition candidates of the word at the beginning of the sentence as the recognition word W1, and stores the recognition result in the recognition result storage unit 6 as the first recognition result (recognition word W1, recognition score RS1). The set data is stored. The recognition candidate selection / recognition score calculation unit 4 selects a recognition candidate for the second word with reference to the speech recognition database storage unit 3 based on the speech feature information of the part following the first recognition result, and calculates the recognition score. To do. The recognition word determination unit 5 determines the word with the highest recognition score among the recognition candidates for the second word as the recognition word, and stores the recognition result in the recognition result storage unit 6 as a recognition result (recognition word W2, recognition score RS2). ) Additional data is stored. Thereafter, the same processing is repeated until the end of the audio data. As a result, it is assumed that the contents stored in the recognition result storage unit 6 are as shown in FIG.

次に、リスニング難易度判定部９Ａは、認識結果記憶部６に記憶された各認識単語と対をなす認識スコアを聞き取りづらさレベル変換テーブルを用いて聞き取りづらさレベルに変換する。この際、聞き取りづらさ変換テーブルは認識結果の単語に対応するテーブルを用いる。そして、各単語の聞き取りづらさレベルの総和／総語数の計算により第１のリスニング難易度ＬＮ１’（識別スコアによる難易度）を求める。また、各単語の語彙レベルの総和／総語数の計算をして従来と同様のテキストベースの難易度ＴＮを求め、（ａ・ＬＮ１’＋ｂ・ＴＮ）の重み付け加算の計算を行って、第２のリスニング難易度ＬＮ２’（識別スコアと語彙レベルを組み合わせた難易度）を求める。ただし、ａ、ｂは、ａ＋ｂ＝１の関係を有する固定の重み付け係数である。 Next, the listening difficulty level determination unit 9A converts the recognition score that is paired with each recognition word stored in the recognition result storage unit 6 into an unintelligibility level using an unintelligibility level conversion table. At this time, a table corresponding to the word of the recognition result is used as the difficulty to hear conversion table. Then, the first listening difficulty LN1 '(difficulty based on the identification score) is obtained by calculating the sum of the difficulty level of each word / the total number of words. Further, the sum of the vocabulary levels of each word / the total number of words is calculated to obtain a text-based difficulty level TN similar to the conventional one, the weighted addition of (a · LN1 ′ + b · TN) is calculated, and the second Listening difficulty LN2 ′ (difficulty combining the identification score and the vocabulary level) is obtained. However, a and b are fixed weighting coefficients having a relationship of a + b = 1.

表示処理部１１はリスニング難易度判定部９Ａで求められた第１、第２のリスニング難易度ＬＮ１’、ＬＮ２’を表示部１０に表示させ、また認識結果の単語列を表示させる。 The display processing unit 11 displays the first and second listening difficulty levels LN1 'and LN2' obtained by the listening difficulty level determination unit 9A on the display unit 10 and displays a word string of the recognition result.

この第２実施例によれば第１実施例と同様に、音声データに対し音声認識処理を行って聞き取りづらさと相関関係を持つ識別スコアを測定し、識別スコアから変換した聞き取りづらさレベルを用いて、英語文のリスニング難易度を判定するようにしたので、文によって単語の発音が変わる場合でも、正確なリスニング難易度が判る。すなわち、第１のリスニング難易度ＬＮ１’によれば、音の聞き取りづらさから見た外国語文の難易度が判り、従来のテキストベースの難易度と比較して、より正確なリスニング難易度が判る。また、第２のリスニング難易度ＬＮ２’によれば、音の聞き取りづらさと、意味のわかりにくさの両方から見た外国語文の難易度が判り、これによっても、従来のテキストベースだけの難易度と比較して、より正確なリスニング難易度が判る。
しかも、単語毎に異なる聞き取りづらさ変換テーブルを用いるようにしたので、識別スコアが低くても音以外の要素で聴き取りが成功し易い単語や、識別スコアが高くても文法など他の要素の手がかりが乏しく、聴き取りが成功しにくい単語などの単語の特性の相違に合致した最適な聞き取りづらさ求められるので、きわめて正確なリスニング難易度が判る。 According to the second embodiment, as in the first embodiment, speech recognition processing is performed on speech data, an identification score correlated with difficulty in hearing is measured, and the level of difficulty in hearing converted from the identification score is used. Thus, the listening difficulty level of the English sentence is determined, so that even when the pronunciation of the word changes depending on the sentence, the accurate listening difficulty level can be determined. That is, according to the first listening difficulty LN1 ′, the difficulty level of the foreign language sentence can be determined from the difficulty of hearing the sound, and the listening difficulty level can be determined more accurately than the conventional text-based difficulty level. . In addition, according to the second listening difficulty LN2 ′, the difficulty level of foreign language sentences can be determined from both the difficulty of hearing sounds and the difficulty of understanding meanings. Compared to, you can see the more accurate listening difficulty.
In addition, since a conversion table that is difficult to hear for each word is used, words that are easy to hear with elements other than sound even if the identification score is low, or other elements such as grammar even if the identification score is high Since it is required to find the optimal listening difficulty that matches the difference in the characteristics of words such as words that have few clues and are difficult to listen to, it is possible to know a very accurate listening difficulty.

なお、上記した第２実施例においても、認識候補選択・認識スコア計算部は、音声特徴情報に基づき音響モデルを参照して次の認識単語を得るために１または複数の認識候補を選択し、識別スコアの最も高い候補を認識単語として決定するようにしたが、外国語文データ記憶部１に予め正しいテキストデータが記憶されていることから、このテキストデータを用いて１つだけ次の認識候補を決定し、該認識候補に対する識別スコアを音響モデルを参照して計算するようにしても良い。
また、音声認識用データベース記憶部に、音響モデルに加えて言語モデルを記憶しておき、認識候補選択・識別スコア計算部は、音響モデルと言語モデルを参照して、１または複数の認識候補の単語を選択するとともに、各認識候補の単語について音響モデルから音響スコアを計算し、言語モデルから言語スコアを計算し、これら音響スコアと言語スコアを加重平均するなどして総スコアを計算し、この総合スコアを識別スコアとするようにしても良い。
また、認識スコアを７段階の聞き取りづらさレベルに変換するようにしたが、６段階以下や８段階以上の段階に分けるようにしてもよい。
また、上記した各実施例では、外国語文データ記憶部から読み出された音声データを音声分析部に入力するようにしたが、マイク入力された外国語の音声信号をＡ／Ｄ変換して音声分析部に入力するようにしたり、インターネットのＷｅｂサイトからダウンロードした外国語の音声データや記録媒体から読み出した外国語の音声データをそのまま音声分析部に入力するようにしても良い。 In the second embodiment described above, the recognition candidate selection / recognition score calculation unit selects one or a plurality of recognition candidates in order to obtain the next recognition word by referring to the acoustic model based on the voice feature information, Although the candidate having the highest identification score is determined as the recognition word, since the correct text data is stored in the foreign language sentence data storage unit 1 in advance, only one next recognition candidate is determined using this text data. The identification score for the recognition candidate may be determined with reference to the acoustic model.
The speech recognition database storage unit stores a language model in addition to the acoustic model, and the recognition candidate selection / identification score calculation unit refers to the acoustic model and the language model to determine one or more recognition candidates. Select a word, calculate the acoustic score from the acoustic model for each recognition candidate word, calculate the language score from the language model, and calculate the total score by weighted averaging of these acoustic scores and language scores, etc. The total score may be used as the identification score.
Moreover, although the recognition score is converted into the seven levels of difficulty in hearing, the recognition score may be divided into six stages or less or eight stages or more.
In each of the embodiments described above, the voice data read from the foreign language sentence data storage unit is input to the voice analysis unit. However, the voice signal of the foreign language input to the microphone is A / D converted into the voice. It may be input to the analysis unit, or foreign language voice data downloaded from a website on the Internet or foreign language voice data read from a recording medium may be input to the voice analysis unit as it is.

本発明は、英語のほか、ドイツ語、フランス語、イタリア語などの多種類の外国語のリスニング学習装置、リスニング学習支援システムに適用可能である。 The present invention is applicable to listening learning apparatuses and listening learning support systems for various types of foreign languages such as German, French, Italian, etc. in addition to English.

２音声分析部
３音声認識用データベース記憶部
４認識候補選択・認識スコア計算部
５認識単語決定部
７語彙レベルテーブル記憶部
８、８Ａ聞き取りづらさレベル変換テーブル記憶部
９、９Ａリスニング難易度判定部
１０表示部 2 Speech analysis unit 3 Speech recognition database storage unit 4 Recognition candidate selection / recognition score calculation unit 5 Recognition word determination unit 7 Vocabulary level table storage unit 8, 8A Difficulty level conversion table storage unit 9, 9A Listening difficulty determination unit 10 Display section

Claims

Speech analysis means for inputting and extracting features of foreign language sentences,
Speech recognition database storage means for storing speech recognition information including an acoustic model;
Recognition candidate selection / recognition score calculation means for selecting a recognition candidate for each word and calculating a recognition score for each recognition candidate by referring to the voice recognition database storage means based on the voice feature information extracted by the voice analysis means When,
A recognition word determining means for determining a recognition candidate having the largest recognition score as a recognition word;
Listening difficulty level judging means for judging the listening difficulty level of the foreign language sentence using the recognition score of each recognized word;
Equipped with a,
The listening difficulty level judging means includes a hard-to-hear conversion table storage means for converting a recognition score into a multi-step difficulty level, and obtains a listening difficulty level by converting the recognition score of each recognized word to a hard-to-hear level. What I did ,
A foreign language difficulty level judging device.

Speech analysis means for inputting and extracting features of foreign language sentences,
Speech recognition database storage means for storing speech recognition information including an acoustic model;
Recognition candidate selection / recognition score calculation means for selecting a recognition candidate for each word and calculating a recognition score for each recognition candidate by referring to the voice recognition database storage means based on the voice feature information extracted by the voice analysis means When,
A recognition word determining means for determining a recognition candidate having the largest recognition score as a recognition word;
Listening difficulty level judging means for judging the listening difficulty level of the foreign language sentence using the recognition score of each recognized word;
With
The listening difficulty level judging means comprises: a hard-to-hear conversion table storing means for storing a conversion table for converting a recognition score into a multi-stage hard-to-hear level; and a vocabulary level table storing means for storing a vocabulary level table for each word. Prepared to find the difficulty of listening by combining the difficulty level of listening and the vocabulary level converted from the recognition score of each recognition word,
Foreign language difficulty determining device you characterized.

Having text data storage means for storing text data of a foreign language sentence,
The recognition candidate selection / recognition score calculation means sets each word of the word string stored in the text data storage means as a recognition candidate,
3. The foreign language difficulty level determination device according to claim 1 or 2,

Speech analysis means for inputting and extracting features of foreign language sentences,
Speech recognition database storage means for storing speech recognition information including an acoustic model;
Recognition candidate selection / recognition score calculation means for selecting a recognition candidate for each word and calculating a recognition score for each recognition candidate by referring to the voice recognition database storage means based on the voice feature information extracted by the voice analysis means When,
A recognition word determining means for determining a recognition candidate having the largest recognition score as a recognition word;
A hard-to-hear conversion table storing means for storing a conversion table for converting a recognition score into a hard-to-hear level for each word;
Listening difficulty level determination means for obtaining a listening difficulty level by using a difficulty level of hearing converted from a recognition score of each recognized word with reference to a conversion table for each word;
A foreign language difficulty level determination device characterized by comprising:

The listening difficulty determination means is
A vocabulary level table storage means for storing a vocabulary level table for each word, and combining the difficulty level of audibility converted from the recognition score of each recognized word with the vocabulary level to obtain a listening difficulty level;
The foreign language difficulty level determination device according to claim 4 , wherein: