JPS59103171A

JPS59103171A - machine translation device

Info

Publication number: JPS59103171A
Application number: JP57212024A
Authority: JP
Inventors: Hiroshi Kushima; 串間　洋
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1982-12-02
Filing date: 1982-12-02
Publication date: 1984-06-14
Also published as: JPH0225215B2

Abstract

PURPOSE:To attain selection of translated multivocal words by comparing a subject field vector with the field vector of each translated word and selecting the translated word having the most coincident translated word field vector. CONSTITUTION:A field vector synthesizing part 11 adds successively the field vector V of each word prepared for a buffer 22 by a vector adder 23 and stores the result vector S to a subject field vector register 24. At the same time, a translated word selecting part 7 actuates an inner product arithmetic part 12, reads a translated word field vector out of the buffer 22 with a switching process 25 to give it to an inner product arithmetic part 26 and carries out an inner product operation between the vector S of the register 24 and the translated word field vector. A comparing part 13 compares the vector S with the translated word field vector by a maximum detector 27 to detect the maximum value. A translated word deciding part 14 sets the translated word code to an output buffer 29 with extraction 28 of translated word for the translated word corresponding to the maximum translated word field vector value. The part 7 repeats the same processing for every word of the original sentence and extracts all translated words to deliver them through an output part 8.

Description

【発明の詳細な説明】（５）発明の技術分野本発明は機械翻訳装置、特に電子計算機によって入力原
文を翻訳して出力する装置であって、入力原文中に現わ
れる１つの単語に対する訳語が複数個ある場合に、最も
適切な訳語を比較的簡易に選出できるようにした機械翻
訳装置に関するものである。Detailed Description of the Invention (5) Technical Field of the Invention The present invention relates to a machine translation device, particularly a device that uses an electronic computer to translate and output an input original text, which translates and outputs a plurality of translated words for one word appearing in the input original text. The present invention relates to a machine translation device that can relatively easily select the most appropriate translation word when there are many translated words.

＋Ｂｌ　　技術の背景と問題点第１図は一般的な機械翻訳装置の構成例を示す。+BL Technology background and problems FIG. 1 shows an example of the configuration of a general machine translation device.

機械翻訳装置においては、自然言語をデジタル情報とし
て入力し２例えば英語から日本語９日本語から英語とい
うように自動翻訳して結果を出力する。第１図において
、制御部１は命令をフェッチして実行し１機械翻訳装置
全体を制御するものである。入力部２から入力された翻
訳対象となる原文は、単ハ１（分割部３によって各単語
に分割される。分割された各単語について、辞書検索部
４は。In a machine translation device, natural language is input as digital information, and the result is automatically translated, for example, from English to Japanese, and from Japanese to English. In FIG. 1, a control unit 1 fetches and executes instructions to control the entire machine translation apparatus. The original text to be translated inputted from the input unit 2 is divided into words by the division unit 3. For each divided word, the dictionary search unit 4 calculates the number of words.

磁気ディスク装置９等に予め用意された単語辞書■０を
検索する。単凸吾辞書１０には、翻訳される計語の各単
語について、その属性、訳語等が登録されている９次に
、（１″４文解析部５は、検索結果に従って、主部・述
部等の構文を解析する。語順決定部６は、その解析結果
に基ついで、単語を訳語に変換した場合に、その訳語を
並べる順序を決定する。訳語選択部７は、原文中の各単
語に対応する訳語がｍ数個ある場合に、その中の１つを
選択するものである。出力部８は、訳語選択部７が選択
した訳語を１語順決定部６が決定した語順に従って並べ
、訳文を作成して出力する。The word dictionary ■0 prepared in advance in the magnetic disk device 9 or the like is searched. In the single-convex dictionary 10, attributes, translations, etc. are registered for each word of the dictionary to be translated. The word order determining unit 6 analyzes the syntax of parts, etc. Based on the analysis result, the word order determining unit 6 determines the order in which the translated words are arranged when the words are converted into translated words.The translated word selecting unit 7 When there are m translations corresponding to , one of them is selected.The output unit 8 arranges the translations selected by the translation word selection unit 7 according to the word order determined by the word order determination unit 6, Create and output the translation.

本発明は、特に上記訳語選択の部分に関連している。例
えは＋　　Ｉ　ｆｒｔ、ｎｃｔｉｏｎ　Ｊという言葉を
日本語に訳す場合に、「機能」と訳さなけれはならない
とぎもあるし、「関数」と訳さなけれはならないとぎも
ある。従来の機械翻訳装置は、このようにある単語に複
数の訳語が考えられるときに１次のような方式によって
、訳語を選択するようにされていた。The present invention particularly relates to the above-mentioned translation word selection part. For example, when translating the words + I frt and nction J into Japanese, there are some words that have to be translated as "functions," and other words that have to be translated as "functions." In conventional machine translation devices, when a plurality of translations for a certain word can be considered, the translation is selected using a first-order method.

第１の方式は、翻訳対象の単語に対応する複数の訳語を
出力し９人間が選択するものである。しかし、この方式
によれば１人手がかかるため、自動翻訳という機械翻訳
の目的に反することとなる。The first method is to output a plurality of translated words corresponding to the word to be translated and nine people select them. However, this method requires one person's effort, which goes against the purpose of machine translation, which is automatic translation.

第２の方式は２例えば数学の分野、スポーツの分野、・
・・というように翻訳対象の原文が用いられている分野
毎に、別々の単語辞書を用意するものである。この方式
によれば、複数の単語辞書を用意しなけれはならず、煩
雑となるだけでなく９例えば、新聞記事を翻訳する場合
など、翻訳対象の分野を予め知り得ないことが多く、使
用できる範囲が限られてしまう。他の方式として、いわ
ゆる知識ペースを用いた高度の意味処理による選択方式
が研究されでいる。しかし、この方式は計算機の処理量
が極めて大きくなるため、大規模なシステムでなりれば
利用することができないという問題がある。The second method is 2For example, the field of mathematics, the field of sports, etc.
Separate word dictionaries are prepared for each field in which the original text to be translated is used. According to this method, it is necessary to prepare multiple word dictionaries, which is not only complicated, but also difficult to use because, for example, when translating newspaper articles, the field to be translated is often not known in advance. The range is limited. As another method, a selection method based on advanced semantic processing using so-called knowledge pace has been studied. However, this method requires an extremely large amount of computer processing, so there is a problem that it cannot be used in a large-scale system.

ｔｅｌ　　発明の目的と構成本発明は上記問題点の解決を図り、高度の意味処理を用
いることなく、適切な訳語を迅速に選択する機械翻訳装
置を提供することを目的としでいる。そのため１本発明
は、単語辞書に登録される単語と、その単語に対応する
各訳語のそれぞれについて、その単語または訳語が用い
られる各分野との適合性を強さとして示す単語分野ベク
トルまたは訳語分野ベクトルを付与し、原文に現われる
単語の単語分野ベクトルを合成することによって得られ
る対象分野ベクトルと、各訳語分野ベクトルとの比較に
よって、最も一致する訳語分野ベクトルをもつ訳語を選
出するようにしたものである。tel OBJECTS AND STRUCTURE OF THE INVENTION The present invention aims to solve the above-mentioned problems, and aims to provide a machine translation device that quickly selects appropriate translated words without using advanced semantic processing. Therefore, 1 the present invention provides a word field vector or a translated word field that indicates, as strength, the compatibility between a word registered in a word dictionary and each translated word corresponding to the word, and each field in which the word or translated word is used. The target field vector obtained by assigning vectors and composing the word field vectors of words that appear in the original text and each translated word field vector are compared to select the translated word with the most matching target field vector. It is.

すなわち１本発明の機械翻訳装置は、原文を入力して、
該入力原文を各単語に分割し、単語辞書に登録された原
文単語に対応する訳語を抽出して。In other words, the machine translation device of the present invention inputs an original text,
The input original text is divided into words, and translated words corresponding to the original text words registered in the word dictionary are extracted.

訳文を構成し出力する機械翻訳装置においで、上記単語
辞書は、各登録単語毎に予め定められた分野の個数を次
元とし、当該単語と上記各分野との適合度をそれぞれ成
分とする単語分野ベクトル情報を保持するとともに、上
記登録単語の各訳語毎に、上記単語分野ベクトルに対応
するベクトルであって当該訳語と上記各分野との適合度
を成分とする訳語分野ベクトル情報を保持し、翻訳対象
の単語群から得られる複数の上記単語分野ベクトルを合
成して対象分野ベクトルを生成する合成部と。In a machine translation device that composes and outputs a translated sentence, the word dictionary has a word field whose dimension is the number of fields predetermined for each registered word, and whose components are the degree of compatibility between the word and each of the above fields. In addition to holding vector information, for each translated word of the registered word, translation word field vector information, which is a vector corresponding to the word field vector and whose component is the degree of compatibility between the translated word and each of the above fields, is stored and translated. a synthesis unit that generates a target domain vector by synthesizing the plurality of word domain vectors obtained from the target word group;

該合成部によって合成された対象分野ベクトルと１つの
単語に対する複数の上記各訳語分野ベクトルとを比較し
、最も類似する訳語分野ベクトルを有する訳語を当該単
語の訳語として決定する訳語選択部とをそなえたことを
特徴としでいる。以下図面を参照しつつ説明する。a translation word selection unit that compares the target field vector synthesized by the synthesis unit and the plurality of translation word field vectors for one word, and determines the translation having the most similar translation word field vector as the translation of the word; It is characterized by This will be explained below with reference to the drawings.

＋Ｄ＋　　発明の実施例第２図は本発明に係る単語辞書の構成例、″）ｖ３図お
よび第４図は分野ベクトルについての説明図。+D+ Embodiment of the Invention FIG. 2 is an example of the configuration of a word dictionary according to the present invention, and FIG. 3 and 4 are explanatory diagrams of field vectors.

Ｎ１５図は本発明の一実施例構成を示す。Diagram N15 shows the configuration of an embodiment of the present invention.

単語辞書は、翻訳対象となる原言語の各単語と。A word dictionary contains each word in the source language to be translated.

それに対応する訳語とが、予め登録されて記憶されてい
るもので、１つの単語については１例えば第２図図示の
ようなフィールド構成を持つ。キー１０−１は、原言語
の単語についての文字コード列等によるキー情報をもつ
。属性フィールド１０−２には、その単語の品詞種別環
一般的な単語情報が格納される。単語分野ベクトル・フ
ィールド１０−３には、後に詳述する如く、その単語が
使用される各分野との適合度を示すベクトル情報が格納
される。訳語フィールド１０−４には、キーｌ０−１に
対応する訳語が、その単語の訳語として適当な訳語数の
分だけ格納される。さらに、訳語分野ベクトル・フィー
ルド１０−５には、その各訳語ごとに、その訳語が用い
られる各分野との適合度を示すベクトル情報が登録時に
設定される。The corresponding translation words are registered and stored in advance, and one word has a field structure as shown in FIG. 2, for example. The key 10-1 has key information such as character code strings for words in the source language. The attribute field 10-2 stores general word information such as part-of-speech type ring of the word. The word field vector field 10-3 stores vector information indicating the degree of compatibility with each field in which the word is used, as will be described in detail later. The translation word field 10-4 stores translation words corresponding to the key l0-1 as many as the appropriate number of translation words for the word. Further, in the translation word field vector field 10-5, for each translated word, vector information indicating the degree of compatibility with each field in which the translated word is used is set at the time of registration.

単語分野ベクトル、訳語分野ベクトルおよび後述する対
象分野ベクトルは、第３図に示す要素をもつ。ベクトル
の次元は、広く言葉が用いられる分野を分類し、あり得
る分野の個数を次元とするよう定められる。そして、ベ
クトルの各要素は。The word field vector, the translation word field vector, and the target field vector, which will be described later, have the elements shown in FIG. The dimension of the vector is determined by classifying the fields in which the word is widely used, and the number of possible fields is determined as the dimension. And each element of the vector is.

例えば先頭の第１分野は「数学」の分野、第２分野は「
スポーツ」の分野というように、各ベクトルに共通に分
野の内容が定められる。For example, the first field at the beginning is the field of "mathematics", and the second field is "
The content of the field is commonly defined for each vector, such as the field of "sports".

例えば［τｅαｃｔｏｒＪという英語の単語を０日本語
に訳す場合、ある場合には「原子炉ｊと訳すのが適当で
あるし、ある場合には「反応器」と訳さなければならな
いときがある。一般に、原子カニ学の分野では、［原子
炉ｊと訳すのが適当で、化学の分野では、「反応器Ｊと
訳したほうが適当な場合が多いであろう。このような場
合に＋　ｒｒａａｃｔｏｒＪの単語に対する訳語「原子
炉］についての訳語分野ベクトルは、原子カニ学の分野
の成分が大きな値を持つように定められる。一方、訳語
［反応器ｊについての訳語分野ベクトルは、化学の分野
の成分が大きな値を持つようにされる。For example, when translating the English word [τeαctorJ] into Japanese, in some cases it is appropriate to translate it as ``nuclear reactor j,'' and in other cases it is necessary to translate it as ``reactor.'' In general, in the field of atomic science, it is appropriate to translate it as [reactor j], and in the field of chemistry, it is often more appropriate to translate it as "reactor J".In such cases, the word + rraactorJ is used. The translation field vector for the translation word "nuclear reactor" is determined such that the component in the field of atomic science has a large value.On the other hand, the translation field vector for the translation word "reactor j" is determined such that the component in the field of chemistry has a large value. It is made to have a large value.

単語分野ベクトルについても、同様にそれぞれの分野と
の適合度が、各成分の値として定められる。例えは、　
　「ｔｈａｔ　Ｊとか「ｔｈｉｓ　Ｊ等といった特色の
ないすべての分野に共通に用いられる言葉は。Similarly, for the word field vector, the degree of compatibility with each field is determined as the value of each component. For example,
``That J,'' ``this J, etc. are words that are commonly used in all non-distinct fields.

どのベクトル成分にも小さな値が割り当てられる。Every vector component is assigned a small value.

ｎ個の分野に分類されているとすると１例えば第４図図
示の如（、ｎ次元ベクトル空間において。Assuming that the fields are classified into n fields, for example, as shown in FIG. 4 (in an n-dimensional vector space.

分野ベクトルＶが与えられることとなる。すなわち、１
つの分野ベクトルＶは、方向と強さとを持ち、用いられ
る分野との関連において、単語または訳語の個性を示す
と考えてよい。A field vector V will be given. That is, 1
Each field vector V has a direction and strength, and can be considered to indicate the individuality of a word or a translated word in relation to the field in which it is used.

次に、第５図を参照し、これらの分野ベクトルに基づい
て訳語を決定する本発明の一実施例構成を説明する。Next, referring to FIG. 5, a configuration of an embodiment of the present invention will be described in which translated words are determined based on these field vectors.

第５図において、符号２ないし１０は木１図に対応し、
１１は翻訳対象の原文に現われる単語についでの単語分
野ベクトルを合成して、対象分野ベクトルを生成する分
野ベクトル合成部、１２は対象分野ベクトルと各訳語分
野ベクトルとについて、内債を演算する内積演算部、１
３は内積演算結果を比較する比較部、１４は比較結果に
基づいて訳語を決定して抽出する訳語決定部、２０は入
力バッファ、２１は原文の単語データ、２２はバッファ
、２３はベクトル加算器、２４は対象分野ベクトル・レ
ジスタ、２５は切換処理命令群、２６は内債演算命令群
、２７は最大値検出命令群、２８は訳語抽出命令群、２
９は出カバソファを表わす。In FIG. 5, numerals 2 to 10 correspond to tree 1,
Reference numeral 11 denotes a field vector synthesis unit that generates a target field vector by synthesizing word field vectors associated with words that appear in the original text to be translated, and 12 an inner product that calculates the internal value of the target field vector and each translated word field vector. Arithmetic unit, 1
3 is a comparison unit that compares the inner product calculation results; 14 is a translation determination unit that determines and extracts a translated word based on the comparison result; 20 is an input buffer; 21 is word data of the original text; 22 is a buffer; 23 is a vector adder. , 24 is a target field vector register, 25 is a group of switching processing instructions, 26 is a group of internal calculation instructions, 27 is a group of maximum value detection instructions, 28 is a group of translated word extraction instructions, 2
9 represents the cover sofa.

入力部２は入力バッファ２０に翻訳対象の原文を入力す
る。単語分割部３は入力した原文を９例えばスペース・
データに従って、単語に分割する。The input unit 2 inputs the original text to be translated into the input buffer 20 . The word dividing unit 3 divides the input original text into 9 spaces, for example.
Divide into words according to data.

辞書検索部４は９分割した各単語について、単語辞書１
０を検索し９例えは′：）Ｖ２図に示したような単語情
報をバッファ２２に読み出す。この単語情報をもとに、
構文解析部５は構文を解析し１語順決定部６は訳語が選
択された場合の訳語の語順を決定する。以上の処理は従
来と同様でよい。The dictionary search unit 4 searches the word dictionary 1 for each word divided into nine parts.
0 is searched, and word information as shown in FIG. Based on this word information,
The syntactic analysis unit 5 analyzes the syntax, and the one-word order determining unit 6 determines the word order of the translated word when the translated word is selected. The above processing may be the same as the conventional one.

分野ヘクトル合成部１１は、バッファ２２に用意された
各単語の単語分野ベクトル■を、ベクトル加算器２３に
よって、順次加算し、結果を対象分野ベクトル・レジス
タ２４に格納する。なお。The field vector synthesis unit 11 sequentially adds the word field vectors of each word prepared in the buffer 22 using a vector adder 23, and stores the result in the target field vector register 24. In addition.

加算にあたってオーバーフローが生じないように適当な
係数が掛けられる。例えは、対象分野ベクトルを３．原
文の単語数を雷、オ（番目の単語分野ベクトルをＶ（Ｏ
とすると、対象分野ベクトル・レジスタ２４に格納され
る対象分野ベクトルＳは次のようになる。An appropriate coefficient is multiplied to prevent overflow during addition. For example, if the target field vector is 3. Let the number of words in the original text be thunder, and let the O(th word field vector be V(O
Then, the subject field vector S stored in the subject field vector register 24 is as follows.

この対象分野ベクトルＳは、翻訳対象の原文が現に用い
られでいる分野の特徴を示していると考えてよい。This target field vector S may be considered to indicate the characteristics of the field in which the original text to be translated is currently used.

訳語選択部７は、ます内積演算部１２を起動する。内債
演算部１２は、切換処理命令群２５によって、バッファ
２２から、１つの単語について複数の訳語がある場合に
、その各訳語の訳語分野ベクトルを読み出しで、内積演
算命令群２６に与える。内債演算命令群２６によって、
対象分野ベクトル・レジスタ２４の対象分野ベクトル３
と各訳語分野ベクトルとの内積演算が行われる。例えは
。The translation word selection section 7 activates the inner product calculation section 12 . When there are a plurality of translations for one word, the internal bond calculation unit 12 uses the switching processing instruction group 25 to read the translation field vector of each translation from the buffer 22 and provides it to the inner product calculation instruction group 26. By the internal debt operation instruction group 26,
Target field vector 3 of target field vector register 24
An inner product operation is performed between and each target word field vector. For example.

オｉ番目の単語のオＴｊ′番目の訳語分野ベクトルをＶ
３（ｉ）で表わすと、ベクトルの内積Ｓ−Ｖ　ｊ（Ｓ）
が、ｊ＝１．２．・・・ｋ（ｋは訳語数）のそれぞれに
ついで。The Tj′-th translation field vector of the i-th word is
3(i), the inner product of vectors S−V j(S)
However, j=1.2. ... after each k (k is the number of translated words).

算出される。この内債の大きさは、各訳語分野ベクトル
Ｖ　３　（＊）と対象分野ベクトルＳとの一致の程度を
示すことは言うまでもない。Calculated. Needless to say, the size of this internal value indicates the degree of coincidence between each translated word field vector V 3 (*) and the target field vector S.

比較部１３は、最大値検出命令群２７によって。The comparison unit 13 uses the maximum value detection command group 27.

上記各内積３−Ｖ３（ｓ）を比較し、最大の値をもつも
のを検出する。訳語決定部１４は、訳語抽出命令群２８
によって、対象分野ベクトルとの内債が最大となる訳語
分野ベクトルＶ１０（Ｓ）に対応する訳語Ｔｊｏ（１）
を、当該単語の訳語とし、出力バッファ２９にその訳語
のコードを設定する。訳語選択部７は。The above inner products 3-V3(s) are compared and the one having the maximum value is detected. The translation determination unit 14 uses a translation extraction command group 28
Therefore, the translation word Tjo(1) corresponding to the translation word field vector V10(S) that has the maximum internal bond with the target field vector
is set as the translation of the word, and the code of the translation is set in the output buffer 29. The translation selection section 7 is.

原文のすべての単語について、同様に処理を繰り返し、
すべての訳語について抽出できたならば。Repeat the process for all words in the original text,
If all translated words could be extracted.

出力部８を呼び出す。出力部８は、出力バッファ２９上
の翻訳結果を１例えば、ラインプリンタ等に印字出力す
る。Call the output section 8. The output unit 8 prints out the translation result on the output buffer 29 to, for example, a line printer.

上記分野ベクトル合成部１１においで、単語分野ベクト
ルを合成する場合、原文の１センテンス毎に合成しても
よいし２．また１例えば対象分野ベクトル・レジスタ２
４の内容を各センテンス毎にクリアしないようにし９合
成の範囲を一連の文章または所定個数の単語群となるよ
うにしてもよい。When the field vector synthesis unit 11 synthesizes word field vectors, it may be synthesized for each sentence of the original sentence. Also 1 For example, target field vector register 2
Instead of clearing the contents of 4 for each sentence, the range of 9 synthesis may be a series of sentences or a predetermined number of word groups.

なお、単語辞書１０に登録する単語分野ベクトルおよび
各訳語の訳語分野ベクトルは、単語辞書ＩＯの作成時に
１例えば経験によって適当な値が定められ設定されるが
、学習によってざらに適当な値に更新されるようにして
もよい。Note that the word field vectors registered in the word dictionary 10 and the translation field vectors of each translated word are determined and set to appropriate values based on experience, for example, at the time of creating the word dictionary IO, but they are roughly updated to appropriate values through learning. It is also possible to do so.

（Ｅｌ　　発明の詳細な説明した如く本発明によれは、高度の意味処理を行う
ことなく、比較的少ない計算機処理時間とメモリとを使
用するだけで、多義語の訳語の選択が可能となる。(El) As described in detail, according to the present invention, a translation of a polysemous word can be selected without performing sophisticated semantic processing and by using relatively little computer processing time and memory.

[Brief explanation of drawings]

第１図は一般的な機械翻訳装置の構成例、；第２図は本
発明に係る単語辞書の構成例、第３図および木４図は分
野ベクトルについての説明図、第５図は本発明について
の一実施例構成を示す。図中、７は訳語選択部、１０は単語辞書、１１は分野ベ
クトル合成部、１２は内債演算部、１３は比較部、１４
は訳語決定部を表わす。特許出願人　富士通株式会社Figure 1 is an example of the configuration of a general machine translation device; Figure 2 is an example of the configuration of a word dictionary according to the present invention; Figures 3 and 4 are explanatory diagrams of field vectors; Figure 5 is an example of the invention. An embodiment of the configuration is shown below. In the figure, 7 is a translation selection section, 10 is a word dictionary, 11 is a field vector synthesis section, 12 is a domestic debt calculation section, 13 is a comparison section, 14
represents the translation word determination section. Patent applicant Fujitsu Limited

Claims

[Claims] An original text is input, and the input original text is divided into words. In a machine translation device that extracts translated words corresponding to original words registered in a word dictionary to compose and output a translated sentence, the word dictionary has a dimension of the number of fields determined in advance for each registered word, and In addition to holding word field vector information whose components are the degree of compatibility between a word and each of the above fields, one for each translation of the registered word, a vector corresponding to the above word field vector that contains the corresponding translation and each of the above fields. It holds translation field vector information whose component is the degree of compatibility with the target word. a synthesis unit that generates a target domain vector by synthesizing the plurality of word domain vectors obtained from a group of words to be translated;
a translation word selection unit that compares the target field vector synthesized by the synthesis unit and the plurality of translation word field vectors for one word, and determines the translation having the most similar translation word field vector as the translation of the word; A machine translation device characterized by: