JP6586055B2

JP6586055B2 - Deep case analysis device, deep case learning device, deep case estimation device, method, and program

Info

Publication number: JP6586055B2
Application number: JP2016138880A
Authority: JP
Inventors: 克人別所; 平野　徹; 徹平野; 牧野　俊朗; 俊朗牧野; 松尾　義博; 義博松尾
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2016-07-13
Filing date: 2016-07-13
Publication date: 2019-10-02
Anticipated expiration: 2036-07-13
Also published as: JP2018010481A

Description

本発明は、深層格解析装置、深層格学習装置、深層格推定装置、方法、及びプログラムに関する。 The present invention relates to a deep case analysis device, a deep case learning device, a deep case estimation device, a method, and a program.

従来の深層格解析技術としては、非特許文献１に記載されているような格フレームに基づくルールベースの手法がある。動詞ごとに、取りうる名詞の意味カテゴリと格助詞の対と、該対に対応する深層格の情報（格フレーム情報という）を、あらかじめ定めておく。例えば、動詞「食べる」に対しては、（名詞意味カテゴリ，格助詞，深層格）として、（動物，が，主格），（食物，を，対象格）といった情報を定めておく。入力となる（名詞，格助詞，動詞）が与えられると、該動詞の格フレーム情報から、該名詞の意味カテゴリと格助詞の対に対応する深層格を取得する。例えば、（ケーキ，を，食べる）が入力されると、「ケーキ」の意味カテゴリが食物であることから、「食べる」の格フレーム情報から、（食物，を）に対応する深層格である対象格を取得する。 As a conventional deep case analysis technique, there is a rule-based method based on a case frame as described in Non-Patent Document 1. For each verb, a possible noun semantic category and case particle pair and deep case information (referred to as case frame information) corresponding to the pair are determined in advance. For example, for the verb “eat”, information such as (animal, but, main case), (food, and target case) is defined as (noun semantic category, case particle, deep case). When an input (noun, case particle, verb) is given, a deep case corresponding to the noun semantic category and case particle pair is acquired from the case frame information of the verb. For example, when (cake, eat) is entered, the meaning category of “cake” is food, so from the case frame information of “eat”, a deep case corresponding to (food, eat) Get a case.

長尾真編,“自然言語処理”,岩波書店,岩波講座ソフトウェア科学15,1996.Nagao, edited by “Natural Language Processing”, Iwanami Shoten, Iwanami Lecture Software Science 15, 1996.

格フレームに基づくルールベースの手法では、格フレーム情報の構築や新語に意味カテゴリを割り振ることなどにおいて、構築コストが大きくかかるという課題がある。また、一旦定めた意味カテゴリの体系や格フレーム情報と整合が取れない（名詞，格助詞，動詞）と深層格の組が出現し、的確な深層格を推定できず、深層格推定に必要な頑健性がないという課題がある。例えば、ある（名詞意味カテゴリＸ，格助詞，深層格）の組は、格フレーム情報にないが、名詞意味カテゴリがＸである特定の名詞に対しては、該組を格フレーム情報に含ませる必要が出てきたり、逆に、ある（名詞意味カテゴリＸ，格助詞，深層格）の組が、格フレーム情報にあるが、名詞意味カテゴリがＸである特定の名詞に対しては、該組の深層格は該当しないといったことが出てきたりする。また、特定の名詞に対しては、既存の意味カテゴリに当てはまらない意味カテゴリＸを創設した上で、（名詞意味カテゴリＸ，格助詞，深層格）の組を格フレーム情報に含ませる必要が出てくることもある。 In the rule-based method based on the case frame, there is a problem that the construction cost is high in constructing the case frame information and assigning a semantic category to a new word. In addition, a set of deep cases that cannot be matched with the semantic category system and case frame information once defined (nouns, case particles, verbs) and the deep case cannot be estimated, which is necessary for deep case estimation. There is a problem of lack of robustness. For example, a certain (noun semantic category X, case particle, deep case) pair is not in the case frame information, but for a specific noun whose noun semantic category is X, the pair is included in the case frame information. There is a need, or conversely, a certain (noun semantic category X, case particle, deep case) pair is in the case frame information, but for a specific noun whose noun semantic category is X, the pair There are some cases where the deep case is not applicable. For specific nouns, it is necessary to create a semantic category X that does not fit into existing semantic categories, and to include a set of (noun semantic category X, case particle, deep case) in case frame information. Sometimes it comes.

本発明の目的は、上記課題を解決するためのものであり、頑健に深層格を推定するための深層格解析装置、深層格学習装置、深層格推定装置、方法、及びプログラムを提供することにある。 An object of the present invention is to solve the above problems, and to provide a deep case analysis device, a deep case learning device, a deep case estimation device, a method, and a program for robustly estimating a deep case. is there.

上記課題を解決するため、第１の発明に係る深層格解析装置は、係り受け関係にある体言文節と用言文節に対し、該体言文節の体言が該用言文節の用言にとってどの深層格にあたるのかを推定する深層格解析装置であって、係り受け関係にある体言文節と用言文節のデータと、該データに対応する正解の深層格との組である正解付データの集合を入力とし、各正解付データに対し、該データから素性及び素性値の組の集合である素性ベクトルを生成することにより、正解付素性ベクトル集合を生成する正解付素性ベクトル集合生成部と、前記正解付素性ベクトル集合から、深層格を分類するための分類モデルを生成する分類モデル生成部と、係り受け関係にある体言文節と用言文節のデータＡを入力とし、該データＡから素性ベクトルＢを生成する素性ベクトル生成部と、前記素性ベクトルＢと前記分類モデルとから、該データＡが各深層格に相当するスコアを算出する分類部と、を含んで構成されている。 In order to solve the above-described problem, the deep case analysis apparatus according to the first aspect of the present invention, for a body phrase clause and a prescriptive phrase in a dependency relationship, which deep case is used for the prescriptive phrase of the body phrase. This is a deep case analysis device that estimates whether or not the answer is a set of correct answer data, which is a set of body phrase clause and use phrase clause data in a dependency relationship and a correct deep case corresponding to the data. A correct feature-added feature vector set generation unit for generating a correct feature-added feature vector set by generating a feature vector that is a set of features and feature values from the data for each correct-attached data, and the correct answer-added feature A classification model generation unit for generating a classification model for classifying a deep case from a vector set, and a body phrase clause and a use phrase clause data A having a dependency relation are input, and a feature vector B is generated from the data A. A feature vector generating unit which, from said feature vector B and the classification model is configured to include a classification unit configured to calculate a score the data A corresponds to the deep cases, the.

第２の発明に係る深層格学習装置は、係り受け関係にある体言文節と用言文節のデータと、該データに対応する正解の深層格との組である正解付データの集合を入力とし、各正解付データに対し、該データから素性及び素性値の組の集合である素性ベクトルを生成することにより、正解付素性ベクトル集合を生成する正解付素性ベクトル集合生成部と、前記正解付素性ベクトル集合から、深層格を分類するための分類モデルを生成する分類モデル生成部と、を含んで構成されている。 The deep case learning device according to the second invention has an input of a set of correct data with a set of data of a body phrase clause and a prescriptive phrase in a dependency relationship and a correct deep case corresponding to the data, A correct feature-added feature vector set generation unit that generates a feature-added feature vector set by generating a feature vector that is a set of features and feature values from the data for each correct-attached data, and the correct-added feature vector A classification model generation unit that generates a classification model for classifying the deep case from the set.

第３の発明に係る深層格推定装置は、係り受け関係にある体言文節と用言文節のデータＡを入力とし、該データＡから素性及び素性値の組の集合である素性ベクトルＢを生成する素性ベクトル生成部と、係り受け関係にある体言文節と用言文節のデータと、該データに対応する正解の深層格との組である正解付データの集合に含まれる各正解付データに対して生成される素性ベクトルの集合である、正解付素性ベクトル集合から予め生成された、深層格を分類するための分類モデルと、前記素性ベクトルＢとから、該データＡが各深層格に相当するスコアを算出する分類部と、を含んで構成されている。 The deep case estimation apparatus according to the third aspect of the present invention receives, as input, a body phrase clause and a prescriptive phrase data A that are in a dependency relationship, and generates a feature vector B that is a set of features and feature values from the data A. For each correct answer data included in a set of correct answer data, which is a set of a feature vector generation unit, data of body phrase clauses and use phrase clauses having dependency relations, and a correct deep case corresponding to the data A score corresponding to each deep case from the classification model for classifying a deep case, which is a set of feature vectors to be generated, and is generated in advance from the feature vector set with correct answer and the feature vector B And a classifying unit for calculating.

また、第４の発明に係る深層格解析方法は、正解付素性ベクトル集合生成部、分類モデル生成部、素性ベクトル生成部、及び分類部を含み、係り受け関係にある体言文節と用言文節に対し、該体言文節の体言が該用言文節の用言にとってどの深層格にあたるのかを推定する深層格解析装置における深層格解析方法であって、前記正解付素性ベクトル集合生成部が、係り受け関係にある体言文節と用言文節のデータと、該データに対応する正解の深層格との組である正解付データの集合を入力とし、各正解付データに対し、該データから素性及び素性値の組の集合である素性ベクトルを生成することにより、正解付素性ベクトル集合を生成するステップと、前記分類モデル生成部が、前記正解付素性ベクトル集合から、深層格を分類するための分類モデルを生成するステップと、前記素性ベクトル生成部が、係り受け関係にある体言文節と用言文節のデータＡを入力とし、該データＡから素性ベクトルＢを生成するステップと、前記分類部が、前記素性ベクトルＢと前記分類モデルとから、該データＡが各深層格に相当するスコアを算出するステップと、を含んで構成されている。 The deep case analysis method according to the fourth invention includes a correct feature-added feature vector set generation unit, a classification model generation unit, a feature vector generation unit, and a classification unit. On the other hand, a deep case analysis method in a deep case analysis apparatus for estimating which deep case corresponds to the prescription of the prescriptive phrase, wherein the correct feature vector set generation unit includes a dependency relationship A set of correct data with a correct answer corresponding to the data of the body phrase clause and prescriptive phrase data and the correct deep case corresponding to the data is input, and for each correct answer data, a feature and a feature value are obtained from the data. Generating a correct feature-added feature vector set by generating a feature vector that is a set of sets, and the classification model generating unit for classifying a deep case from the correct answer-added feature vector set A feature model generating step, the feature vector generating unit receiving data A of a body phrase clause and a prescriptive phrase in a dependency relationship, generating a feature vector B from the data A, and the classifying unit The data A includes a step of calculating a score corresponding to each deep case from the feature vector B and the classification model.

また、第５の発明に係る深層格学習方法は、正解付素性ベクトル集合生成部、及び分類モデル生成部を含む深層格学習装置における深層格学習方法であって、前記正解付素性ベクトル集合生成部が、係り受け関係にある体言文節と用言文節のデータと、該データに対応する正解の深層格との組である正解付データの集合を入力とし、各正解付データに対し、該データから素性及び素性値の組の集合である素性ベクトルを生成することにより、正解付素性ベクトル集合を生成するステップと、前記分類モデル生成部が、前記正解付素性ベクトル集合から、深層格を分類するための分類モデルを生成するステップと、を含んで構成されている。 Further, the deep case learning method according to the fifth invention is a deep case learning method in a deep case learning device including a correct answer feature vector set generation unit and a classification model generation unit, wherein the correct answer feature vector set generation unit Is a set of data with correct answers, which is a set of data of body phrase clauses and use phrase clauses in a dependency relationship and a deep case of a correct answer corresponding to the data, and for each correct answer data, A step of generating a feature vector set with correct answers by generating a feature vector that is a set of features and feature values, and the classification model generating unit for classifying a deep case from the correct feature-added feature vector set Generating a classification model.

また、第６の発明に係る深層格推定方法は、素性ベクトル生成部、及び分類部を含む深層格推定装置における深層格推定方法であって、前記素性ベクトル生成部が、係り受け関係にある体言文節と用言文節のデータＡを入力とし、該データＡから素性及び素性値の組の集合である素性ベクトルＢを生成するステップと、前記分類部が、係り受け関係にある体言文節と用言文節のデータと、該データに対応する正解の深層格との組である正解付データの集合に含まれる各正解付データに対して生成される素性ベクトルの集合である、正解付素性ベクトル集合から予め生成された、深層格を分類するための分類モデルと、前記素性ベクトルＢとから、該データＡが各深層格に相当するスコアを算出するステップと、を含んで構成されている。 A deep case estimation method according to a sixth aspect of the present invention is a deep case estimation method in a deep case estimation apparatus including a feature vector generation unit and a classification unit, wherein the feature vector generation unit has a dependency relationship. A step of generating a feature vector B which is a set of features and feature values from the data A as input of clause A and phrase clause data A, and the classification unit and a body phrase clause having a dependency relationship From the correct feature-added feature vector set, which is a set of feature vectors generated for each correct-attached data included in the correct-attached data set that is a set of clause data and a correct deep case corresponding to the data The classification model for classifying the deep case generated in advance and the step of calculating a score corresponding to each deep case from the feature vector B are included in the data A.

また、本発明のプログラムは、コンピュータを、上記の深層格解析装置、上記の深層格学習装置、若しくは上記の深層格推定装置の各部として機能させるための、又はコンピュータに、上記の深層格解析方法、上記の深層格学習方法、若しくは上記の深層格推定方法の各ステップを実行させるためのプログラムである。 The program of the present invention causes a computer to function as each part of the deep case analysis device, the deep case learning device, or the deep case estimation device, or causes the computer to perform the deep case analysis method. This is a program for executing the steps of the deep case learning method or the deep case estimation method.

また、本発明の素性として、データ中に存在する表記文字列または品詞または意味カテゴリをとるようにしてもよい。 Further, as a feature of the present invention, a notation character string, a part of speech, or a semantic category existing in data may be taken.

また、本発明の素性及び素性値の組の集合として、データ中の体言に対し、コーパスにおいて該体言を含む係り受け関係にある体言文節と用言文節からとった該体言文節付属部と用言文節または用言との対とその頻度の組の集合をとるか、または、データ中の用言に対し、コーパスにおいて該用言を含む用言文節と係り受け関係にある体言文節とその頻度の組の集合をとるか、または、データ中の用言文節に対し、コーパスにおいて該用言文節と係り受け関係にある体言文節とその頻度の組の集合をとるか、または、前記いずれかの組の集合において、用言文節中の用言や体言文節中の体言の意味カテゴリが同一で、かつ、他の表記情報が同一のものは同一視して頻度は加算したものをとるようにしてもよい。 In addition, as a set of features and feature values according to the present invention, the body phrase clause appendices and usages taken from the body clauses and prescriptive clauses that have a dependency relationship in the corpus with respect to the body language in the corpus. Take a set of pairs of phrases or predicates and their frequency, or, for the predicates in the data, the body phrases and their frequencies that are dependent on the prescriptive phrase containing the predicates in the corpus Taking a set of pairs, or taking a set of body phrase clauses and their frequencies in the corpus for the use clauses in the data, or taking any set of the above In the set of, the meaning category of the prescriptive phrase in the prescriptive phrase and the body phrase in the prescriptive phrase is the same, and the other notation information is the same and the frequency is added. Good.

また、本発明の素性及び素性値の組の集合として、データ中の各形態素の概念ベクトルをとるようにしてもよい。 Moreover, you may make it take the concept vector of each morpheme in data as a set of the set of the feature and feature value of this invention.

本発明では、大量の学習データから統計的手法により、データの大勢を反映した分類モデルを導出する。このため学習データの中に、素性値に不備がある等の多少のノイズがあったとしても、分類モデルは的確なものとなるため、頑健に深層格を推定できる。また、あらかじめ単語ごとに付与する意味カテゴリ以外にも、意味カテゴリのような単語の意味に相当し、かつ、自動的に獲得できる素性を始め、他の素性がある。このため、意味カテゴリ付与が完全でなくとも、他の素性の情報により、的確に深層格を推定でき、構築コストを従来手法よりも低減できる。 In the present invention, a classification model reflecting a large amount of data is derived from a large amount of learning data by a statistical method. For this reason, even if there is some noise in the learning data, such as an incomplete feature value, the classification model becomes accurate, so the deep case can be estimated robustly. In addition to the semantic categories assigned to each word in advance, there are other features corresponding to the meaning of a word such as a semantic category and features that can be automatically acquired. For this reason, even if the semantic category assignment is not complete, it is possible to accurately estimate the deep case based on the information of other features, and the construction cost can be reduced as compared with the conventional method.

本発明によれば、頑健に深層格を推定することができる。 According to the present invention, a deep case can be estimated robustly.

本発明の実施の形態に係る深層格解析装置の機能的構成を示すブロック図である。It is a block diagram which shows the functional structure of the deep case analysis apparatus which concerns on embodiment of this invention. 正解付データの集合の一例を示す図である。It is a figure which shows an example of the collection of data with a correct answer. 正解付素性ベクトル集合の一例を示す図である。It is a figure which shows an example of a correct answer addition feature vector set. 深層格の推定対象であるデータＡの一例を示す図である。It is a figure which shows an example of the data A which is the estimation object of a deep case. データＡから生成される素性ベクトルＢの一例を示す図である。It is a figure which shows an example of the feature vector B produced | generated from the data A. 体言文節付属部を「で」に固定した場合の正解付データ集合の一例を示す図である。It is a figure which shows an example of a data set with a correct answer at the time of fixing a body phrase phrase attachment part to "de". 体言文節付属部を「で」に固定した場合のデータＡの一例を示す図である。It is a figure which shows an example of the data A at the time of fixing a body phrase phrase attachment part to "de". 体言「鉛筆」と係り受け関係にある（体言文節付属部，用言（終止形））とその頻度の組の集合の一例を示す図である。It is a figure which shows an example of the set of the group of the expression "pencil" and a dependency relation (an attachment part of a body sentence, a pretext (end form)), and its frequency. 用言「書く」と係り受け関係にある（体言，体言文節付属部）とその頻度の組の集合の一例を示す図である。It is a figure which shows an example of the set of a set of the frequency which is in dependency relation with the predicate "write" (a body part, a body sentence clause attachment part), and its frequency. 意味カテゴリを用いて生成した共起ベクトルの一例を示す図である。It is a figure which shows an example of the co-occurrence vector produced | generated using the semantic category. 本発明の実施の形態に係る深層格解析装置の学習部における学習処理ルーチンを示すフローチャート図である。It is a flowchart figure which shows the learning process routine in the learning part of the deep case analysis apparatus which concerns on embodiment of this invention. 本発明の実施の形態に係る深層格解析装置の推定部における推定処理ルーチンを示すフローチャート図である。It is a flowchart figure which shows the estimation process routine in the estimation part of the deep case analysis apparatus which concerns on embodiment of this invention.

以下、図面とともに本発明の実施の形態を説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

＜本発明の実施の形態の概要＞
本発明の実施の形態は、係り受け関係にある体言文節と用言文節に対し、該体言文節の体言が該用言文節の用言にとってどの深層格にあたるのかを推定する深層格解析装置、方法、及びプログラムに関する。 <Outline of Embodiment of the Present Invention>
An embodiment of the present invention relates to a deep case analysis apparatus and method for estimating, for a body phrase clause and a prescriptive phrase in a dependency relationship, a deep case corresponding to the prescriptive phrase of the body phrase And the program.

本発明の実施の形態でいう深層格とは、動詞を始めとする用言に対する名詞の意味役割を表している。例えば「部屋で箸で食べる」において、体言文節「部屋で」と「箸で」は、それぞれ用言文節「食べる」と係り受け関係にあり、体言文節における体言「部屋」や「箸」の表層格はデ格であるが、用言文節「食べる」の用言「食べる」にとっての深層格はそれぞれ、場所格、道具格となる。一般に深層格の種類としては、様々なものが提唱されており、例として、主格、対象格、道具格、源泉格、目標格、場所格、時間格、経験者格などがある。本発明の実施の形態は、深層格の種類を有限個、あらかじめ定めた上で、係り受け関係にある体言文節と用言文節に対し、対応する深層格を推定する深層格解析技術に関するものである。なお、本発明の実施の形態における用言文節には、「学生だ」のような「体言＋だ」も含むものとする。 The deep case referred to in the embodiment of the present invention represents the semantic role of nouns for verbs and other predicates. For example, in “eating with chopsticks in the room”, the phrase phrases “in the room” and “with chopsticks” have a dependency relationship with the phrase “eating”, and the surface layer of the phrases “room” and “chopsticks” in the phrase phrase Although the case is a de-case, the deep case for the predicate “eating” in the preaching phrase “eating” is a place case and a tool case, respectively. In general, various types of deep cases have been proposed. Examples include a main case, a target case, a tool case, a source case, a target case, a place case, a time case, and an experienced case. The embodiment of the present invention relates to a deep case analysis technique for estimating a corresponding deep case for a body phrase clause and a use phrase clause having a dependency relationship, after finite types of deep case types are determined in advance. is there. Note that the phrase used in the embodiment of the present invention includes “symbol +” such as “student”.

＜深層格解析装置の構成＞
本発明の実施の形態に係る深層格解析装置の構成について説明する。図１は、本発明の請求項１記載の深層格解析装置の構成例である。図１に示すように、本発明の実施の形態に係る深層格解析装置１００は、ＣＰＵと、ＲＡＭと、後述する各処理ルーチンを実行するためのプログラムや各種データを記憶したＲＯＭと、を含むコンピュータで構成することが出来る。この深層格解析装置１００は、機能的には図１に示すように入力部１０と、演算部２０と、出力部３０とを備えている。 <Configuration of deep case analyzer>
A configuration of the deep case analysis apparatus according to the embodiment of the present invention will be described. FIG. 1 is a configuration example of a deep case analysis apparatus according to claim 1 of the present invention. As shown in FIG. 1, a deep case analysis apparatus 100 according to an embodiment of the present invention includes a CPU, a RAM, and a ROM that stores programs and various data for executing processing routines to be described later. Can be configured with a computer. Functionally, the deep case analysis apparatus 100 includes an input unit 10, an arithmetic unit 20, and an output unit 30 as shown in FIG.

入力部１０は、係り受け関係にある体言文節と用言文節のデータと、該データに対応する正解の深層格との組である正解付データの集合を入力として受け付ける。また、入力部１０は、係り受け関係にある体言文節と用言文節のデータＡを入力として受け付ける。正解付データの集合とデータＡとについては後述する。 The input unit 10 receives as input a set of correct data with a set of body phrase clause and use phrase clause data having a dependency relationship and a correct deep case corresponding to the data. In addition, the input unit 10 receives data A of the body phrase clause and the prescriptive phrase having a dependency relationship as input. A set of correct data and data A will be described later.

演算部２０は、学習部２２と、分類モデル記憶部２４と、推定部２６とを含んで構成されている。学習部２２は、学習データである正解付データの集合を入力とし、深層格を分類するための分類モデルを生成する。学習部２２の処理が終わった後、推定部２６にて、係り受け関係にある体言文節と用言文節のデータＡが入力部１０により入力されると、分類モデルを参照して、該データＡに対応する深層格を推定する。 The computing unit 20 includes a learning unit 22, a classification model storage unit 24, and an estimation unit 26. The learning unit 22 receives a set of correct data with learning data and generates a classification model for classifying the deep case. After the processing of the learning unit 22 is completed, when the input unit 10 inputs the data A of the body phrase clause and the prescriptive phrase having a dependency relationship in the estimation unit 26, the data A is referred to by referring to the classification model. The deep case corresponding to is estimated.

学習部２２は、正解付素性ベクトル集合生成部２２０と分類モデル生成部２２２とを備えている。 The learning unit 22 includes a correct feature-added feature vector set generation unit 220 and a classification model generation unit 222.

正解付素性ベクトル集合生成部２２０は、入力部１０により受け付けた、係り受け関係にある体言文節と用言文節のデータと、該データに対応する正解の深層格との組である正解付データの集合を入力とする。図２は、正解付データの集合の例を示すものである。各正解付データは、体言文節における自立部である体言、体言文節付属部、用言文節における自立部である用言、用言文節付属部、及び深層格から成っている。体言は、最後の構成形態素のみをとるようにしてもよい。図２では用言は、終止形で示している。用言文節が「体言＋だ」の場合は、該体言を用言とする。6番目のデータは、「ねずみが食べられる」からとったものであり、用言としては、用言文節「食べられる」の自立部「食べ」の終止形「食べる」をとっている。用言「食べる」にとって、体言「ねずみ」は対象格に相当する。 The correct feature-added feature vector set generation unit 220 receives correct answer-added data, which is a set of the dependency phrase and prescriptive phrase data received by the input unit 10 and the correct deep case corresponding to the data. Takes a set as input. FIG. 2 shows an example of a set of data with correct answers. Each correct data includes a body part that is an independent part in a body phrase, a body part appendage, a word that is a self-supporting part in a prescriptive phrase, a part phrase appendage, and a deep case. The body language may take only the last constituent morpheme. In FIG. 2, the prescription is shown in an end form. When the prescriptive phrase is “body word +”, this word is used as a word. The sixth data is taken from “Eat the mouse”, and the predicate is “Eat”, the end of the self-supporting part “Eat” in the prescriptive phrase “Eat”. For the predicate “eat”, the word “mouse” corresponds to the subject case.

この正解付データは、例えば、テキストコーパスを係り受け解析して抽出した係り受け関係にある体言文節と用言文節に対し、対応する正解の深層格を付与することにより作成する。 The correct answer-added data is created, for example, by assigning the corresponding correct deep case to the body phrase clause and the prescriptive phrase in the dependency relationship extracted by dependency analysis of the text corpus.

正解付素性ベクトル集合生成部２２０は、各正解付データに対し、該データである体言文節と用言文節から素性及び素性値の組の集合である素性ベクトルを生成することにより、正解付素性ベクトル集合を生成する。図３は、正解付素性ベクトル集合の例を示すものである。素性ベクトルの次元数はＮであり、各素性値は実数値をとる。 The correct feature-added feature vector set generation unit 220 generates correct feature-added feature vectors for each correct-attached data by generating a feature vector that is a set of features and feature values from the body phrase clause and the prescriptive phrase as the data. Create a set. FIG. 3 shows an example of a correct feature-added feature vector set. The number of dimensions of the feature vector is N, and each feature value takes a real value.

分類モデル生成部２２２は、正解付素性ベクトル集合生成部２２０によって生成された前記正解付素性ベクトル集合から、深層格を分類するための分類モデルを生成する。具体的には、深層格ごとに、正解付素性ベクトル集合を、該深層格の素性ベクトル群と、該深層格でない素性ベクトル群とに分け、該深層格か否かの２値分類を解くための分類モデルをサポートベクタマシン等の機械学習手法により生成する。このようにして、各深層格に対し、対応する分類モデルが生成される。 The classification model generation unit 222 generates a classification model for classifying the deep case from the correct answer feature vector set generated by the correct answer feature vector set generation unit 220. Specifically, for each deep case, the correct-added feature vector set is divided into a feature vector group of the deep case and a feature vector group that is not the deep case, and the binary classification of whether or not the deep case is to be solved Are generated by a machine learning method such as a support vector machine. In this way, a corresponding classification model is generated for each deep case.

分類モデル記憶部２４には、分類モデル生成部２２２によって各深層格に対して生成された分類モデルが格納される。 The classification model storage unit 24 stores the classification model generated for each deep case by the classification model generation unit 222.

推定部２６は、素性ベクトル生成部２６０と分類部２６２とを備えている。 The estimation unit 26 includes a feature vector generation unit 260 and a classification unit 262.

素性ベクトル生成部２６０は、入力部１０により受け付けた、係り受け関係にある体言文節と用言文節のデータＡを入力とする。データＡの内容は、正解付素性ベクトル集合生成部２２０の入力である正解付データ集合のデータと同様である。図４は、データＡの例を示すものであり、データＡは、体言文節における自立部である体言、体言文節付属部、用言文節における自立部である用言、用言文節付属部から成っている。 The feature vector generation unit 260 receives the data A of the body phrase clause and the prescriptive phrase that are received by the input unit 10 and have a dependency relationship. The content of the data A is the same as the data of the data set with correct answer that is the input of the feature vector set generating unit 220 with correct answer. FIG. 4 shows an example of the data A. The data A is composed of a body part which is a self-supporting part in a body phrase clause, a body part clause attachment part, a word which is a self-supporting part in a prescriptive phrase part, and a word phrase attachment part. ing.

素性ベクトル生成部２６０は、正解付素性ベクトル集合生成部２２０の、データから素性ベクトルを生成するアルゴリズムと同じアルゴリズムで、該データＡから素性ベクトルＢを生成する。素性ベクトルＢの内容は、正解付素性ベクトル集合生成部２２０の出力である正解付素性ベクトル集合の素性ベクトルと同様となる。図５は、素性ベクトルＢの例を示すものであり、素性ベクトルの次元数はＮであり、各素性値は実数値をとる。 The feature vector generation unit 260 generates the feature vector B from the data A by the same algorithm as the algorithm for generating the feature vector from the data of the correct-added feature vector set generation unit 220. The content of the feature vector B is the same as the feature vector of the correct-added feature vector set output from the correct-added feature vector set generation unit 220. FIG. 5 shows an example of the feature vector B. The number of dimensions of the feature vector is N, and each feature value takes a real value.

分類部２６２は、素性ベクトル生成部２６０によって生成された前記素性ベクトルＢと分類モデル記憶部２４に格納された各深層格の分類モデルとから、該データＡが各深層格に相当するスコアを算出する。具体的には、深層格ごとに、素性ベクトルＢと該深層格に対応する分類モデルとから、素性ベクトルＢが該深層格に相当するスコアを算出する。ある閾値以上のスコアをもつ深層格を、推定深層格として出力する。図４のデータＡは、素性ベクトル生成部２６０により図５の素性ベクトルＢとなり、分類部２６２にて深層格が道具格であると推定される。 The classification unit 262 calculates a score corresponding to each deep case from the feature vector B generated by the feature vector generation unit 260 and the classification model of each deep case stored in the classification model storage unit 24. To do. Specifically, for each deep case, a score corresponding to the deep case is calculated from the feature vector B and the classification model corresponding to the deep case. A deep case having a score equal to or higher than a certain threshold is output as an estimated deep case. The data A in FIG. 4 is converted into the feature vector B in FIG. 5 by the feature vector generation unit 260, and the deep case is estimated to be a tool case by the classification unit 262.

以上、本発明の請求項１記載の深層格解析装置の構成例を述べたが、体言文節付属部を一つに固定した上で、学習と推定を行ってもよい。即ち正解付データ集合を、固定した体言文節付属部をもつデータのみに限定した上で、学習を行う。推定も、該体言文節付属部をもつデータＡを入力として行う。体言文節付属部に関する素性は、全データで共通であり、分類素性として意味をなさないため、正解付素性ベクトル集合生成部２２０及び素性ベクトル生成部２６０において、体言文節付属部に関する素性は抽出しない。 As mentioned above, although the example of a structure of the deep case analysis apparatus of Claim 1 of this invention was described, you may perform learning and presumption, after fixing a body phrase phrase attachment part to one. That is, learning is performed after the data set with the correct answer is limited to only data having a fixed phrasal phrase appendix. The estimation is also performed by using the data A having the body phrase phrase appendage as an input. Since the feature related to the body phrase clause attachment is common to all data and does not make sense as a classification feature, the feature-added feature vector set generation unit 220 and the feature vector generation unit 260 do not extract the feature related to the body phrase clause attachment.

例えば、体言文節付属部を「で」に固定すると、図２の正解付データ集合は、体言文節付属部が「で」のデータのみに限定した図６となる。推定は、図７のような体言文節付属部が「で」のデータＡを入力として行う。体言文節付属部に関する素性は抽出しないため、図６、図７では、体言文節付属部を記載していない。 For example, if the body phrase clause attachment is fixed to “de”, the data set with the correct answer in FIG. 2 becomes FIG. 6 in which the body phrase clause attachment is limited to data with “de” only. The estimation is performed by using the data A in which the body phrase clause appendage as shown in FIG. 7 is “de”. Since the feature related to the body phrase clause attachment is not extracted, the body clause clause attachment is not described in FIGS. 6 and 7.

次に、正解付素性ベクトル集合生成部２２０と素性ベクトル生成部２６０において、データから抽出する、深層格の分類に有効な素性及び素性値について、以下、詳細に述べる。 Next, features and feature values effective for classification of deep cases extracted from data in the feature vector set generation unit 220 and the feature vector generation unit 260 will be described in detail below.

深層格が用言に対する名詞の意味役割であることから、各形態素の意味的な情報が分類に有効な素性となりうる。また、深層格決定が体言文節付属部や用言文節付属部に依存することから、これらの表記が分類に有効な素性となりうる。 Since the deep case is the semantic role of nouns for predicates, the semantic information of each morpheme can be an effective feature for classification. Moreover, since the deep case determination depends on the body phrase clause appendix and the use phrase clause appendage, these notations can be effective features for classification.

そのため、素性として、データ中に存在する表記文字列または品詞または意味カテゴリをとることができる。これらの素性は、同一文字列であっても、抽出元の種別（体言、体言文節付属部、用言、用言文節付属部）が異なれば、別の素性として取り扱う。 Therefore, a notation character string, part of speech or semantic category existing in the data can be taken as the feature. These features are handled as different features even if they are the same character string, if the type of extraction source (a body part, a body phrase clause attachment part, a predicate, a word phrase attachment part) is different.

表記としては、体言、体言文節付属部、用言（終止形）、用言文節付属部の文字列が挙げられる。また、それぞれの各構成形態素の表記も挙げられる。この場合、体言に関しては、最後の構成形態素の表記のみを素性としてとるというようにしてもよい。図２の６番目のデータの場合、体言文節付属部が「が」で、用言文節付属部が「られる」であるが、このような素性から、データが受動態や可能表現であることが識別でき、そのことを反映した深層格の学習及び推定ができる。 Examples of the notation include a body text, a body phrase appendage part, a script (end form), and a text string of the text phrase appendage. Moreover, the notation of each constituent morpheme is also mentioned. In this case, regarding the body language, only the last constituent morpheme may be used as a feature. In the case of the sixth data in FIG. 2, the body phrase clause attachment is “ga” and the prescriptive phrase attachment is “enabled”, but from this feature, it is identified that the data is passive or possible expression It is possible to learn and estimate the deep case reflecting that.

品詞としては、体言の最後の構成形態素の品詞、用言の品詞などが挙げられる。形態素解析器によっては、品詞が、複数の細品詞から構成されていることがあり、そのような場合、品詞全体を素性としてとることもできるし、各細品詞を素性としてとることもできる。細品詞には、人名や地名等に相当することを表すものもあり、そのような情報も、深層格の分類に有効な素性となる。 The part of speech includes the part of speech of the last constituent morpheme of the body and the part of speech of the predicate. Depending on the morphological analyzer, the part of speech may be composed of a plurality of fine parts of speech. In such a case, the whole part of speech can be taken as a feature, or each fine part of speech can be taken as a feature. Some of the fine parts of speech indicate that they correspond to personal names, place names, etc., and such information is also an effective feature for the classification of deep cases.

意味カテゴリとは、類義する単語を一つのカテゴリとしてまとめ上げたものを意味している。形態素解析用の単語辞書中の各単語に意味カテゴリを付与しておくことにより、体言の最後の構成形態素の意味カテゴリや、用言の意味カテゴリを素性としてとることができる。 The semantic category means a group of similar words as one category. By assigning a semantic category to each word in the word dictionary for morphological analysis, the semantic category of the last constituent morpheme or the semantic category of the prescription can be taken as a feature.

学習データ中に存在する単語表記や品詞、意味カテゴリの異なりの全てが素性となり、対象データが該素性を含むとき該素性の素性値は１となり、含まないとき該素性の素性値は０となる。 All the differences in word notation, part of speech, and semantic category that exist in the learning data are features. When the target data includes the feature, the feature value of the feature is 1, and when it does not include the feature value of the feature is 0. .

また、素性ベクトルが表す素性及び素性値の組の集合として、データ中の体言に対し、コーパスにおいて該体言を含み、かつ、係り受け関係にある体言文節と用言文節からとった該体言文節付属部と用言文節または用言との対とその頻度の組の集合を含むことができる。ここで体言は、最後の構成形態素とするというようにしてもよい。図８は、体言「鉛筆」に対し、コーパスにおいて、「鉛筆」を含み、かつ、係り受け関係にある体言文節と用言文節からとった（体言文節付属部，用言（終止形））とその頻度の組の集合をとったものである。これを、体言「鉛筆」から抽出した素性及び素性値の組の集合とする。 In addition, as a set of features and feature values represented by a feature vector, the body phrase included in the corpus, including the body phrase in the corpus, and from the body phrase clause and the use phrase clause that are in a dependency relationship It can include a set of pairs of parts and prescription clauses or predicates and their frequencies. Here, the word may be the last constituent morpheme. FIG. 8 shows the word “pencil” in the corpus, which includes “pencil” and is taken from the word phrases and prescriptive phrases that are in a dependency relationship (an appendix to the word phrases, the word (end form)). It is a set of pairs of the frequencies. This is a set of features and feature value sets extracted from the word “pencil”.

また、素性ベクトルが表す素性及び素性値の組の集合として、データ中の用言に対し、コーパスにおいて該用言を含む用言文節と係り受け関係にある体言文節とその頻度の組の集合をとるか、または、データ中の用言文節に対し、コーパスにおいて該用言文節と係り受け関係にある体言文節とその頻度の組の集合をとることができる。体言文節中の体言は、最後の構成形態素とするというようにしてもよい。図９は、用言「書く」に対し、コーパスにおいて、「書く」を含む用言文節と係り受け関係にある体言文節からとった（体言，体言文節付属部）とその頻度の組の集合である。これを、用言「書く」から抽出した素性及び素性値の組の集合とする。 In addition, as a set of a set of features and feature values represented by the feature vector, a set of a set of a combination of a body phrase clause and its frequency in a dependency relationship with a prescriptive clause including the prescription in the corpus is used for the prescription in the data. Alternatively, for a prescriptive phrase in the data, it is possible to take a set of a set of body phrase clauses and their frequencies in a dependency relationship with the prescriptive phrase in the corpus. The body language in the body language clause may be the last constituent morpheme. FIG. 9 shows a set of combinations of the frequency of the phrase “writing”, which is taken from the body phrase clause that is dependent on the phrase clause including “writing” in the corpus (body language, body language phrase appendix). is there. This is a set of features and feature values extracted from the predicate “write”.

上記で挙げた素性及び素性値の組の集合を共起ベクトルと呼ぶ。共起ベクトルが近い語句は、意味的に近いという性質に基づき、共起ベクトルを採用している。 A set of the features and feature values listed above is called a co-occurrence vector. A phrase having a close co-occurrence vector adopts the co-occurrence vector based on the property that it is semantically close.

また、素性ベクトルが表す素性及び素性値の組の集合として、前記いずれかの組の集合において、用言文節中の用言や体言文節中の体言の意味カテゴリが同一で、かつ、他の表記情報が同一のものは同一視して頻度は加算したものをとることができる。体言の意味カテゴリは、最後の構成形態素の意味カテゴリをとるというようにしてもよい。図１０は、図９の共起ベクトルにおいて、（学生，が）と（先生，が）に対し、「学生」と「先生」の意味カテゴリは［人］で同一であり、他の表記情報は「が」で同一であるため、素性を同一視して（［人］，が）とし、頻度は加算した１００としている。また、図９の共起ベクトルにおいて、（本，を）と（小説，を）に対し、「本」と「小説」の意味カテゴリは［書物］で同一であり、他の表記情報は「を」で同一であるため、素性を同一視して（［書物］，を）とし、頻度は加算した３７４としている。また、図９の共起ベクトルにおいて、（横浜，で）に対し、「横浜」の意味カテゴリは［地名］であるため、（［地名］，で）とその頻度２７をとっている。 In addition, as a set of a set of features and feature values represented by a feature vector, in any of the set of the above-mentioned sets, the semantic category of the prescriptive phrase in the prescriptive phrase or the prescriptive phrase is the same, and other notation The same information can be regarded as the same and the frequency added. The semantic category of the body may be the semantic category of the last constituent morpheme. FIG. 10 shows that in the co-occurrence vector of FIG. 9, the meaning category of “student” and “teacher” is the same for [person] for (student, ga) and (teacher, ga), and other notation information is Since “g” is the same, the feature is identified as ([person], g), and the frequency is set to 100. In addition, in the co-occurrence vector of FIG. 9, the meaning category of “book” and “novel” is the same in [book] for (book,) and (novel,), and other notation information is “ The feature is identified as ([book],), and the frequency is set to 374. Further, in the co-occurrence vector of FIG. 9, since the semantic category of “Yokohama” is (place name) with respect to (Yokohama,), the frequency 27 is taken as (with [place name],).

また、素性ベクトルが表す素性及び素性値の組の集合として、データ中の各形態素の概念ベクトルをとることができる。非特許文献２の手法によって生成する単語概念ベクトルが概念ベクトルの一例であり、意味的に近い単語対の各概念ベクトルは近いという性質がある。 Moreover, the concept vector of each morpheme in data can be taken as a set of a set of features and feature values represented by the feature vector. A word concept vector generated by the method of Non-Patent Document 2 is an example of a concept vector, and each concept vector of a word pair that is semantically close is close.

［非特許文献２］別所克人, 内山俊郎, 内山匡, 片岡良治, 奥雅博,“単語・意味属性間共起に基づくコーパス概念ベースの生成方式,”情報処理学会論文誌, Dec. 2008, Vol.49, No.12, pp.3997-4006. [Non-patent document 2] Katsuto Bessho, Toshiro Uchiyama, Kei Uchiyama, Ryoji Kataoka, Masahiro Oku, “Corpus concept-based generation method based on co-occurrence between words and semantic attributes,” IPSJ Journal, Dec. 2008, Vol.49, No.12, pp.3997-4006.

例えば、素性ベクトルが表す素性及び素性値の組の集合として、体言の最後の構成形態素の概念ベクトルや、用言の概念ベクトルをとる。また、体言の各構成形態素の概念ベクトルを加算して長さ１に正規化した概念ベクトルをとってもよい。 For example, as a set of a set of features and feature values represented by the feature vector, the concept vector of the last constituent morpheme of the body word or the concept vector of the predicate is taken. Alternatively, a concept vector normalized to length 1 by adding the concept vectors of the constituent morphemes may be taken.

図１１は、学習部２２の処理フローの一例である。入力部１０が、正解付データの集合を受け付けると、図１１に示す学習処理ルーチンが実行される。 FIG. 11 is an example of a processing flow of the learning unit 22. When the input unit 10 receives a set of correct data, the learning process routine shown in FIG. 11 is executed.

まず、ステップＳ１００において、正解付素性ベクトル集合生成部２２０は、入力部１０によって受け付けた、正解付データの集合を取得する。 First, in step S <b> 100, the correct feature-added feature vector set generation unit 220 acquires a set of correct answer-added data received by the input unit 10.

そして、ステップＳ１０２において、正解付素性ベクトル集合生成部２２０は、上記ステップＳ１００で受け付けた正解付データの集合の各正解付データに対し、該データである体言文節と用言文節から素性及び素性値の組の集合である素性ベクトルを生成することにより、正解付素性ベクトル集合を生成する。 In step S102, the correct feature-added feature vector set generation unit 220, for each correct-attached data in the correct answer-added data set accepted in step S100, from the body phrase and the prescriptive phrase as the data, and the feature and feature value. By generating a feature vector that is a set of groups, a feature vector set with correct answers is generated.

ステップＳ１０４において、分類モデル生成部２２２は、正解付素性ベクトル集合生成部２２０によって生成された前記正解付素性ベクトル集合から、各深層格について、該深層格であるか否かを分類するための分類モデルを生成する。そして、分類モデル生成部２２２は、分類モデルを分類モデル記憶部２４に格納し、学習処理ルーチンを終了する。 In step S <b> 104, the classification model generation unit 222 classifies each deep case from the correct case feature vector set generated by the correct feature vector set generation unit 220 to classify whether or not the deep case is the deep case. Generate a model. Then, the classification model generation unit 222 stores the classification model in the classification model storage unit 24, and ends the learning processing routine.

図１２は、推定部２６の処理フローの一例である。入力部１０が、深層格の推定対象であるデータＡを受け付けると、図１２に示す推定処理ルーチンが実行される。 FIG. 12 is an example of a processing flow of the estimation unit 26. When the input unit 10 receives the data A, which is a deep case estimation target, an estimation processing routine shown in FIG. 12 is executed.

まず、ステップＳ２００において、素性ベクトル生成部２６０は、入力部１０によって受け付けたデータＡを取得する。 First, in step S <b> 200, the feature vector generation unit 260 acquires data A received by the input unit 10.

次に、ステップＳ２０２において、素性ベクトル生成部２６０は、正解付素性ベクトル集合生成部２２０の、データから素性ベクトルを生成するアルゴリズムと同じアルゴリズムで、上記ステップＳ２００で取得したデータＡから素性ベクトルＢを生成する。 Next, in step S202, the feature vector generation unit 260 obtains the feature vector B from the data A acquired in step S200 using the same algorithm as the algorithm for generating the feature vector from the data of the correct feature-added feature vector set generation unit 220. Generate.

次に、ステップＳ２０４において、分類部２６２は、上記ステップＳ２０２で生成された前記素性ベクトルＢと分類モデル記憶部２４に格納された各深層格の分類モデルとから、上記ステップＳ２００で取得したデータＡが各深層格に相当するスコアを算出する。そして、ある閾値以上のスコアをもつ深層格を推定深層格とする。 Next, in step S204, the classification unit 262 uses the data A acquired in step S200 from the feature vector B generated in step S202 and the classification model of each deep case stored in the classification model storage unit 24. Calculates a score corresponding to each deep case. Then, a deep case having a score equal to or higher than a certain threshold is set as an estimated deep case.

そして、ステップＳ２０６において、分類部２６２は、推定結果として、上記ステップＳ２０４で得られた推定深層格を出力し、推定処理ルーチンを終了する。 In step S206, the classification unit 262 outputs the estimated deep case obtained in step S204 as the estimation result, and ends the estimation processing routine.

以上説明したように、本実施の形態の深層格解析装置によれば、的確な深層格を推定することにより、テキストを意味構造に変換した上で、テキスト間の意味構造レベルでの照合（検索等）や変換（生成、要約、翻訳等）を行う処理の精度を向上させることができるという効果を奏する。 As described above, according to the deep case analysis apparatus of the present embodiment, the text is converted into a semantic structure by estimating an accurate deep case, and then matching (searching) between the texts at the semantic structure level is performed. Etc.) and conversion (generation, summarization, translation, etc.) processing accuracy can be improved.

本実施の形態の深層格解析装置は、構築コストを従来手法よりも低減でき、的確な深層格を推定するのに必要な頑健性をもつ。 The deep case analysis apparatus according to the present embodiment can reduce the construction cost as compared with the conventional method and has robustness necessary for estimating an accurate deep case.

これまで述べた処理をプログラムとして構築し、当該プログラムを通信回線または記録媒体からインストールし、ＣＰＵ等の手段で実施することが可能である。 It is possible to construct the processing described so far as a program, install the program from a communication line or a recording medium, and implement it by means such as a CPU.

なお、本発明は、上記の実施例に限定されることなく、特許請求の範囲内において、種々変更・応用が可能である。 The present invention is not limited to the above-described embodiments, and various modifications and applications are possible within the scope of the claims.

例えば、本実施の形態では、分類モデルの学習処理と深層格の推定処理とを１つの装置として構成する場合を例に説明したが、これに限定されるものではない。例えば、分類モデルの学習処理と深層格の推定処理とを別々の装置として構成してもよい。この場合には、学習部２２を備えた深層格学習装置と、推定部２６を備えた深層格推定装置として構成してもよい。 For example, in the present embodiment, the case where the classification model learning process and the deep case estimation process are configured as one apparatus has been described as an example, but the present invention is not limited to this. For example, the classification model learning process and the deep case estimation process may be configured as separate devices. In this case, you may comprise as a deep case learning apparatus provided with the learning part 22, and a deep case estimation apparatus provided with the estimation part 26. FIG.

本発明は、テキストを意味構造に変換した上で、テキスト間の意味構造レベルでの照合（検索等）や変換（生成、要約、翻訳等）を行う言語処理技術に適用可能である。 The present invention can be applied to a language processing technology that performs matching (searching, etc.) and conversion (generation, summarization, translation, etc.) at a semantic structure level between texts after converting the text into a semantic structure.

１０入力部
２０演算部
２２学習部
２４分類モデル記憶部
２６推定部
３０出力部
１００深層格解析装置
２２０正解付素性ベクトル集合生成部
２２２分類モデル生成部
２６０素性ベクトル生成部
２６２分類部 DESCRIPTION OF SYMBOLS 10 Input part 20 Operation part 22 Learning part 24 Classification model memory | storage part 26 Estimation part 30 Output part 100 Deep case analysis apparatus 220 Correct answer feature vector set generation part 222 Classification model generation part 260 Feature vector generation part 262 Classification part

Claims

A set of correct data with a combination of body phrase clause and use phrase data in a dependency relationship and a correct deep case corresponding to the data is input, and for each correct data, A correct feature-added feature vector set generation unit that generates a correct feature-added feature vector set by generating a feature vector that is a set of feature values;
A classification model generation unit for generating a classification model for classifying a deep case from the correct feature vector set;
Only including,
As a set of sets of the features and the feature values,
Using the notation character strings that exist in the body phrase clause and prescriptive phrase in the data,
And,
A pair of the body phrase clause attachment part of the body phrase clause and the phrase clause or the phrase taken from a phrase clause having a dependency relationship with the body clause including the body language in the corpus for the body phrase in the data, A set of pairs with the frequency of a pair, or a set of pairs of a body phrase clause appendage of the body phrase clause and the prescriptive phrase or prescription, and the frequency of the pair, When counting the frequency of the pair, whether the meaning category of the prescription in the pair of prescriptive clauses is the same and the other notation information is the same, and the frequency of the pair is counted ,
For a pretext in the data, take a set of a set of a body phrase clause that is in a dependency relationship with a prescriptive phrase including the prescription in the corpus and the frequency of the body phrase clause, or A set of body phrase clauses having a dependency relationship with a phrase and the frequency of the body phrase clauses, and when counting the frequency of the body phrase clauses, the semantic categories of the body phrases in the body phrase clause are the same, and , The other notation information is the same, and the frequency of the body phrase is counted, or
For a prescriptive phrase in the data, take a set of a set of a body phrase phrase that is in a dependency relationship with the prescriptive phrase in the corpus and a frequency of the body phrase phrase, or a dependency relationship with the prescriptive phrase It is a set of a set phrase phrase and the frequency of the set phrase phrase, and when counting the frequency of the set phrase phrase, the notation semantic category in the set phrase phrase is the same and other notation information is the same Take the same frequency and count the frequency of the body phrase,
A deep case learning device characterized by that.

A feature vector generation unit that receives data A of the body phrase clause and the use clause clause in a dependency relationship, and generates a feature vector B that is a set of features and feature values from the data A;
The feature vector generated for each correct answer data included in a set of correct answer data that is a set of data of the body phrase clause and the prescriptive phrase data in the dependency relation and the correct deep case corresponding to the data. A classification model for classifying a deep case, which is a set, and is generated in advance from a correct feature vector set, and a classification unit for calculating a score corresponding to each deep case from the feature vector B; ,
Only including,
As a set of sets of the features and the feature values,
Using the notation character strings that exist in the body phrase clause and prescriptive phrase in the data,
And,
A pair of the body phrase clause attachment part of the body phrase clause and the phrase clause or the phrase taken from a phrase clause having a dependency relationship with the body clause including the body language in the corpus for the body phrase in the data, A set of pairs with the frequency of a pair, or a set of pairs of a body phrase clause appendage of the body phrase clause and the prescriptive phrase or prescription, and the frequency of the pair, When counting the frequency of the pair, whether the meaning category of the prescription in the pair of prescriptive clauses is the same and the other notation information is the same, and the frequency of the pair is counted ,
For a pretext in the data, take a set of a set of a body phrase clause that is in a dependency relationship with a prescriptive phrase including the prescription in the corpus and the frequency of the body phrase clause, or A set of body phrase clauses having a dependency relationship with a phrase and the frequency of the body phrase clauses, and when counting the frequency of the body phrase clauses, the semantic categories of the body phrases in the body phrase clause are the same, and , The other notation information is the same, and the frequency of the body phrase is counted, or
For a prescriptive phrase in the data, take a set of a set of a body phrase phrase that is in a dependency relationship with the prescriptive phrase in the corpus and a frequency of the body phrase phrase, or a dependency relationship with the prescriptive phrase It is a set of a set phrase phrase and the frequency of the set phrase phrase, and when counting the frequency of the set phrase phrase, the notation semantic category in the set phrase phrase is the same and other notation information is the same Take the same frequency and count the frequency of the body phrase,
A deep case estimation apparatus characterized by that.

A feature vector generation unit that receives data A of the body phrase clause and the use clause clause in a dependency relationship, and generates a feature vector B that is a set of features and feature values from the data A;
The feature vector generated for each correct answer data included in a set of correct answer data that is a set of data of the body phrase clause and the prescriptive phrase data in the dependency relation and the correct deep case corresponding to the data. A classification model for classifying a deep case, which is a set, and is generated in advance from a correct feature vector set, and a classification unit for calculating a score corresponding to each deep case from the feature vector B; ,
Only including,
As a set of sets of the features and the feature values,
Using the notation character strings that exist in the body phrase clause and prescriptive phrase in the data,
And,
A pair of the body phrase clause attachment part of the body phrase clause and the phrase clause or the phrase taken from a phrase clause having a dependency relationship with the body clause including the body language in the corpus for the body phrase in the data, A set of pairs with the frequency of a pair, or a set of pairs of a body phrase clause appendage of the body phrase clause and the prescriptive phrase or prescription, and the frequency of the pair, When counting the frequency of the pair, whether the meaning category of the prescription in the pair of prescriptive clauses is the same and the other notation information is the same, and the frequency of the pair is counted ,
For a pretext in the data, take a set of a set of a body phrase clause that is in a dependency relationship with a prescriptive phrase including the prescription in the corpus and the frequency of the body phrase clause, or A set of body phrase clauses having a dependency relationship with a phrase and the frequency of the body phrase clauses, and when counting the frequency of the body phrase clauses, the semantic categories of the body phrases in the body phrase clause are the same, and , The other notation information is the same, and the frequency of the body phrase is counted, or
For a prescriptive phrase in the data, take a set of a set of a body phrase phrase that is in a dependency relationship with the prescriptive phrase in the corpus and a frequency of the body phrase phrase, or a dependency relationship with the prescriptive phrase It is a set of a set phrase phrase and the frequency of the set phrase phrase, and when counting the frequency of the set phrase phrase, the notation semantic category in the set phrase phrase is the same and other notation information is the same Take the same thing and count the frequency of the body phrase,
And, as the set of the feature and the feature value set, the notation of the body phrase clause appendix and the notation of the phrase clause appendage, which are present in the data, are used.
A deep case estimation apparatus characterized by that.

A feature vector generation unit that receives data A of the body phrase clause and the use clause clause in a dependency relationship, and generates a feature vector B that is a set of features and feature values from the data A;
The feature vector generated for each correct answer data included in a set of correct answer data that is a set of data of the body phrase clause and the prescriptive phrase data in the dependency relation and the correct deep case corresponding to the data. A classification model for classifying a deep case, which is a set, and is generated in advance from a correct feature vector set, and a classification unit for calculating a score corresponding to each deep case from the feature vector B; ,
Only including,
As a set of sets of the features and the feature values,
Using the notation character strings that exist in the body phrase clause and prescriptive phrase in the data,
And,
A pair of the body phrase clause attachment part of the body phrase clause and the phrase clause or the phrase taken from a phrase clause having a dependency relationship with the body clause including the body language in the corpus for the body phrase in the data, A set of pairs with the frequency of a pair, or a set of pairs of a body phrase clause appendage of the body phrase clause and the prescriptive phrase or prescription, and the frequency of the pair, When counting the frequency of the pair, whether the meaning category of the prescription in the pair of prescriptive clauses is the same and the other notation information is the same, and the frequency of the pair is counted ,
For a pretext in the data, take a set of a set of a body phrase clause that is in a dependency relationship with a prescriptive phrase including the prescription in the corpus and the frequency of the body phrase clause, or A set of body phrase clauses having a dependency relationship with a phrase and the frequency of the body phrase clauses, and when counting the frequency of the body phrase clauses, the semantic categories of the body phrases in the body phrase clause are the same, and , The other notation information is the same, and the frequency of the body phrase is counted, or
For a prescriptive phrase in the data, take a set of a set of a body phrase phrase that is in a dependency relationship with the prescriptive phrase in the corpus and a frequency of the body phrase phrase, or a dependency relationship with the prescriptive phrase It is a set of a set phrase phrase and the frequency of the set phrase phrase, and when counting the frequency of the set phrase phrase, the notation semantic category in the set phrase phrase is the same and other notation information is the same Take the same thing and count the frequency of the body phrase,
And, as a set of the feature and the set of feature values, using the notation of the body phrase clause appendix and the notation of the phrase clause appendage existing in the data,
And, as a set of the feature and the feature value set, for the feature from which the feature in the data is extracted, the information of the last constituent morpheme part of the statement is used.
A deep case estimation apparatus characterized by that.

A deep case learning method in a deep case learning device including a correct feature vector set generation unit and a classification model generation unit,
The correct feature-added feature vector set generation unit receives as input a set of correct answer-added data, which is a set of data of the body phrase clause and prescriptive phrase clause having a dependency relationship and a correct deep case corresponding to the data. Generating a correct feature-added feature vector set by generating a feature vector that is a set of features and feature values from the data for the attached data;
The classification model generation unit generating a classification model for classifying a deep case from the correct feature vector set;
Only including,
As a set of sets of the features and the feature values,
Using the notation character strings that exist in the body phrase clause and prescriptive phrase in the data,
And,
A pair of the body phrase clause attachment part of the body phrase clause and the phrase clause or the phrase taken from a phrase clause having a dependency relationship with the body clause including the body language in the corpus for the body phrase in the data, A set of pairs with the frequency of a pair, or a set of pairs of a body phrase clause appendage of the body phrase clause and the prescriptive phrase or prescription, and the frequency of the pair, When counting the frequency of the pair, whether the meaning category of the prescription in the pair of prescriptive clauses is the same and the other notation information is the same, and the frequency of the pair is counted ,
For a pretext in the data, take a set of a set of a body phrase clause that is in a dependency relationship with a prescriptive phrase including the prescription in the corpus and the frequency of the body phrase clause, or A set of body phrase clauses having a dependency relationship with a phrase and the frequency of the body phrase clauses, and when counting the frequency of the body phrase clauses, the semantic categories of the body phrases in the body phrase clause are the same, and , The other notation information is the same, and the frequency of the body phrase is counted, or
For a prescriptive phrase in the data, take a set of a set of a body phrase phrase that is in a dependency relationship with the prescriptive phrase in the corpus and a frequency of the body phrase phrase, or a dependency relationship with the prescriptive phrase It is a set of a set phrase phrase and the frequency of the set phrase phrase, and when counting the frequency of the set phrase phrase, the notation semantic category in the set phrase phrase is the same and other notation information is the same Take the same frequency and count the frequency of the body phrase,
A deep case learning method characterized by that.

A deep case estimation method in a deep case estimation apparatus including a feature vector generation unit and a classification unit,
The feature vector generation unit receives the data A of the body phrase clause and the use phrase clause in a dependency relationship, and generates a feature vector B that is a set of features and feature values from the data A;
The classification unit generates for each correct answer data included in a set of correct answer data, which is a set of body phrase clause and prescriptive phrase data in a dependency relationship and a correct deep case corresponding to the data. From the classification model for classifying the deep case, which is generated in advance from the correct feature vector set, and the feature vector B, a score corresponding to each deep case is obtained. A calculating step;
Only including,
As a set of sets of the features and the feature values,
Using the notation character strings that exist in the body phrase clause and prescriptive phrase in the data,
And,
A pair of the body phrase clause attachment part of the body phrase clause and the phrase clause or the phrase taken from a phrase clause having a dependency relationship with the body clause including the body language in the corpus for the body phrase in the data, A set of pairs with the frequency of a pair, or a set of pairs of a body phrase clause appendage of the body phrase clause and the prescriptive phrase or prescription, and the frequency of the pair, When counting the frequency of the pair, whether the meaning category of the prescription in the pair of prescriptive clauses is the same and the other notation information is the same, and the frequency of the pair is counted ,
For a pretext in the data, take a set of a set of a body phrase clause that is in a dependency relationship with a prescriptive phrase including the prescription in the corpus and the frequency of the body phrase clause, or A set of body phrase clauses having a dependency relationship with a phrase and the frequency of the body phrase clauses, and when counting the frequency of the body phrase clauses, the semantic categories of the body phrases in the body phrase clause are the same, and , The other notation information is the same, and the frequency of the body phrase is counted, or
For a prescriptive phrase in the data, take a set of a set of a body phrase phrase that is in a dependency relationship with the prescriptive phrase in the corpus and a frequency of the body phrase phrase, or a dependency relationship with the prescriptive phrase It is a set of a set phrase phrase and the frequency of the set phrase phrase, and when counting the frequency of the set phrase phrase, the notation semantic category in the set phrase phrase is the same and other notation information is the same Take the same frequency and count the frequency of the body phrase,
A deep case estimation method characterized by that.

Program for a computer to function as each unit of the deep case learning equipment according to claim 1, wherein.

Program for causing a computer to function as each section of the deep case estimation apparatus according to any one of 請 Motomeko 2 to claim 4.