JP5564705B2

JP5564705B2 - Sentence structure analyzing apparatus, sentence structure analyzing method, and sentence structure analyzing program

Info

Publication number: JP5564705B2
Application number: JP2010161464A
Authority: JP
Inventors: 修今一
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2010-07-16
Filing date: 2010-07-16
Publication date: 2014-08-06
Anticipated expiration: 2030-07-16
Also published as: JP2012022599A

Description

本発明は、文構造解析装置に関し、特に、文を構成する文節の係り受け関係を解析する技術に関する。 The present invention relates to a sentence structure analyzing apparatus, and more particularly to a technique for analyzing a dependency relation of clauses constituting a sentence.

自然言語で書かれた文（テキスト）を理解するためには、文の構造を明らかにする必要がある。文の構造を解析する技術として、文の構成要素（文節）間の係り受け関係（修飾・被修飾の関係）を解析する係り受け解析技術が知られている。ここで、文節とは、１個以上の自立語（動詞や名詞など）と、自立語に続く０個以上の機能語（助詞や助動詞など）から構成される単語列である。例えば、「太郎が本を読んだ」という文の係り受けを解析すると、「太郎が」、「本を」、「読んだ」が文節として同定され、「太郎が」が「読んだ」に係り、「本を」が「読んだ」に係る、という係り受け関係が同定される。 In order to understand a sentence (text) written in a natural language, it is necessary to clarify the structure of the sentence. As a technique for analyzing the structure of a sentence, a dependency analysis technique for analyzing a dependency relation (a relation between modification and modification) between constituent elements (sentences) of a sentence is known. Here, the phrase is a word string composed of one or more independent words (verbs, nouns, etc.) and zero or more function words (particles, auxiliary verbs, etc.) following the independent words. For example, when analyzing the dependency of the sentence “Taro read a book”, “Taro”, “Book”, “Read” are identified as phrases, and “Taro” is related to “Read” , The dependency relationship that “the book” relates to “read” is identified.

係り受け解析においては、係り先の曖昧性を解消することが課題である。例えば、「昨日買った漱石の本を読んだ」という文では、「買った」が「漱石」に係るのか、又は、「本」に係るのかという、係り先の曖昧性がある。近年、このような係り先の曖昧性を解消する技術として、統計的係り受け解析が主流である（非特許文献１参照）。統計的係り受け解析とは、係り受け関係が付与された文から、機械学習手法によって係り受け関係の統計モデルを学習し、学習結果を用いて曖昧性を解消しながら係り受けを解析する手法である。 In dependency analysis, it is a problem to eliminate the ambiguity of the dependency destination. For example, in a sentence “I read a meteorite book I bought yesterday”, there is an ambiguity of the destination, whether “I bought” is related to “Meteorite” or “Book”. In recent years, statistical dependency analysis has been the mainstream as a technique for eliminating such ambiguity of a dependency destination (see Non-Patent Document 1). Statistical dependency analysis is a method that learns a statistical model of a dependency relationship from a sentence with a dependency relationship using a machine learning method and analyzes the dependency while solving the ambiguity using the learning result. is there.

係り受け関係のうち、係り先が述語（動詞や動作性名詞など）である係り受け関係を、特に、述語項構造と呼ぶ。述語項構造を明らかにすることによって、文に含まれる４Ｗ１Ｈの情報（誰が、いつ、どこで、何を、どうした）を抽出することができる。そのため、情報検索や情報抽出などの自然言語処理では、高精度な述語項構造解析の必要性が高まっている。 Among the dependency relationships, the dependency relationship whose dependency destination is a predicate (such as a verb or a behavioral noun) is particularly called a predicate term structure. By clarifying the predicate term structure, 4W1H information (who, when, where, what, what) included in the sentence can be extracted. Therefore, in natural language processing such as information retrieval and information extraction, the need for highly accurate predicate term structure analysis is increasing.

しかしながら、日本語では、文脈から推定可能な構成要素が省略されることが多い。そのため、表層の係り受け関係を解析するだけでは、必要な係り受け関係の情報が欠落してしまう。表層の係り受け関係とは、文中に明示的に示される係り受け関係である。例えば、「鈴木は京都で生まれ、神戸で育った」という文の表層の係り受け関係を解析すると、「鈴木は」及び「京都で」が「生まれ」に係り、「神戸で」が「育った」に係る、という表層の係り受け関係が同定される。一方、この例では、「育った」のガ格の格要素である「鈴木が」が省略されている。ここで、省略された格要素をゼロ代名詞と呼び、ゼロ代名詞の指示対象（この例では「鈴木」）を先行詞と呼ぶ。そうすると、ゼロ代名詞の係り先となる述語（この例では「育った」）と、先行詞（この例では「鈴木」）との間に係り受け関係が存在している。このような係り受け関係を暗黙の係り受け関係という。従来、照応解析と呼ばれる方法によって、ゼロ代名詞の先行詞を同定することによって、この暗黙の係り受け関係を解析していた。 However, in Japanese, components that can be estimated from the context are often omitted. For this reason, only by analyzing the dependency relationship of the surface layer, necessary dependency relationship information is lost. A surface dependency relationship is a dependency relationship that is explicitly indicated in a sentence. For example, if you analyze the dependency relationship of the sentence “Suzuki was born in Kyoto and raised in Kobe”, “Suzuki is” and “in Kyoto” are related to “born”, and “in Kobe” is “bred” ”Related to the surface layer is identified. On the other hand, in this example, “Suzuki ga”, which is a case element of the “growed up” character case, is omitted. Here, the omitted case element is called a zero pronoun, and the indication object of the zero pronoun (“Suzuki” in this example) is called an antecedent. Then, there is a dependency relationship between the predicate (in this example, “Growed up”) that is the destination of the zero pronoun and the antecedent (“Suzuki” in this example). Such a dependency relationship is called an implicit dependency relationship. Conventionally, this implicit dependency relationship has been analyzed by identifying antecedents of zero pronouns by a method called anaphora analysis.

照応解析では、文の係り受けを解析した後、動詞の格フレーム辞書（ガ格、ヲ格などの動詞がとるべき格を列挙した辞書）を用いてゼロ代名詞を検出し、センタリング理論などの言語学的な知識を用いた手法（非特許文献２参照）、統計的手法（非特許文献３参照）又はこれらを組み合わせた手法（特許文献１参照）によって先行詞を検出する。 In anaphora analysis, after analyzing the dependency of sentences, zero pronouns are detected using verb case frame dictionaries (dictionaries that enumerate verbs such as ga and wo), and languages such as centering theory are used. The antecedent is detected by a technique using scientific knowledge (see Non-Patent Document 2), a statistical technique (see Non-Patent Document 3), or a combination of these (see Patent Document 1).

なお、ゼロ代名詞を検出するためには、大規模且つ高精度な格フレーム辞書が必要である。しかしながら、人手によるこのような格フレーム辞書の構築はコストを要する。また、大規模コーパスから自動的に格フレーム辞書を構築する手法（非特許文献４参照）が提案されているが、現状では精度が不十分である。 In order to detect zero pronouns, a large-scale and highly accurate case frame dictionary is required. However, manual construction of such a case frame dictionary is costly. Also, a method of automatically building a case frame dictionary from a large corpus (see Non-Patent Document 4) has been proposed, but the accuracy is insufficient at present.

特開２００５−０２５６５９号公報Japanese Patent Laid-Open No. 2005-025659

工藤拓、松本裕治：チャンキングの段階適用による日本語係り受け解析、情報処理学会論文誌、Ｖｏｌ．４３、Ｎｏ．６、ｐｐ．１８３４−１８４２、２００２．Taku Kudo, Yuji Matsumoto: Japanese dependency analysis by applying the chunking stage, IPSJ Journal, Vol. 43, no. 6, pp. 1834-1842, 2002. Barbara J. Grosz, Aravind K. Joshi, Scott Weinstein:Centering: A Framework for Modeling the Local Coherence of Discourse,Computational Linguistics, Vol.21, No.2, 1995.Barbara J. Grosz, Aravind K. Joshi, Scott Weinstein: Centering: A Framework for Modeling the Local Coherence of Discourse, Computational Linguistics, Vol. 21, No. 2, 1995. 飯田龍、乾健太郎、松本裕治：文脈的手がかりを考慮した機械学習による日本語ゼロ代名詞の先行詞同定、情報処理学会論文誌、Ｖｏｌ．４５、Ｎｏ．３、ｐｐ．９０６−９１８、２００４．Ryu Iida, Kentaro Inui, Yuji Matsumoto: Identification of antecedents of Japanese zero pronouns by machine learning considering contextual cues, IPSJ Journal, Vol. 45, no. 3, pp. 906-918, 2004. 河原大輔、黒橋禎夫：格フレーム辞書の漸次的自動獲得、自然言語処理、Ｖｏｌ．１２、Ｎｏ．２、ｐｐ．１０９−１３１、２００５．Daisuke Kawahara, Ikuo Kurohashi: Gradual automatic acquisition of case frame dictionary, natural language processing, Vol. 12, no. 2, pp. 109-131, 2005.

ところで、上記従来技術において、高精度な述語項構造解析を実現するためには、高精度な照応解析が必要であり、高精度な照応解析を実現するためには、大規模かつ高精度な格フレーム辞書が必要である。 By the way, in the above prior art, in order to realize a high-precision predicate term structure analysis, a high-precision anaphora analysis is necessary, and in order to realize a high-accuracy anaphora analysis, a large-scale and high-accuracy case is required. A frame dictionary is required.

しかしながら、上述したように、大規模かつ高精度な格フレーム辞書の構築には各種の課題がある。また、文の係り受けを解析した後に、照応解析を実行するという処理の流れは、モデルが複雑であり、計算効率も好ましくないという問題がある。 However, as described above, there are various problems in building a large-scale and highly accurate case frame dictionary. Further, the flow of processing of executing anaphora analysis after analyzing sentence dependency has a problem that the model is complicated and calculation efficiency is not preferable.

本発明は、上述した課題を考慮したものであって、述語項構造解析において、大規模かつ高精度な格フレーム辞書を利用することなく、モデルの複雑さを回避し、計算効率を高める文構造解析装置を提供することを目的とする。 The present invention takes the above-described problems into consideration, and avoids the complexity of the model and increases the calculation efficiency without using a large-scale and high-precision case frame dictionary in the predicate term structure analysis. An object is to provide an analysis apparatus.

本願において開示される発明の代表的な一例を示せば以下の通りである。すなわち、プログラムを実行するプロセッサと、前記プロセッサによって実行されるプログラムを格納するメモリとを備え、入力された文章の構造を解析する文構造解析装置であって、入力された文章を、形態素を単位とした単語に分割する形態素解析手段と、前記形態素解析手段によって分割された各単語を基に、複数の文節からなる文節列を生成する文節解析手段と、前記文節解析手段によって生成された各文節間の係り受け関係を解析する係り受け解析手段と、を備え、前記係り受け解析手段は、前記文節解析手段によって生成された文節列から、任意の異なる文節のペアを選択し、選択された文節ペアのそれぞれに対して係り受けスコアを計算して前記メモリに格納し、所定の閾値以上の係り受けスコアを持つ文節ペアに係り受け関係があると解析することを特徴とする。 A typical example of the invention disclosed in the present application is as follows. That is, a sentence structure analyzing apparatus that includes a processor that executes a program and a memory that stores a program executed by the processor, and that analyzes the structure of the input sentence. Morpheme analyzing means for dividing into words, phrase analyzing means for generating a phrase string composed of a plurality of phrases based on each word divided by the morpheme analyzing means, and each phrase generated by the phrase analyzing means Dependency analysis means for analyzing the dependency relationship between the selected phrases, the dependency analysis means selects any different pairs of phrases from the phrase sequence generated by the phrase analysis means, and the selected phrase A dependency score is calculated for each of the pairs, stored in the memory, and a dependency pair having a dependency score equal to or greater than a predetermined threshold is calculated. Characterized by analysis and there is.

本発明によれば、述語項構造解析において、大規模かつ高精度な格フレーム辞書を利用することなく、モデルの複雑さを回避し、計算効率を高めることができる。 According to the present invention, in the predicate term structure analysis, the complexity of the model can be avoided and the calculation efficiency can be improved without using a large-scale and highly accurate case frame dictionary.

本発明の第１の実施形態の文構造解析装置の概略構成を示す図である。It is a figure which shows schematic structure of the sentence structure analysis apparatus of the 1st Embodiment of this invention. 本発明の第１の実施形態の形態素解析手段の処理結果の一例を示す図である。It is a figure which shows an example of the processing result of the morpheme analysis means of the 1st Embodiment of this invention. 本発明の第１の実施形態の統合係り受け解析手段の詳細構成を示す図である。It is a figure which shows the detailed structure of the integrated dependency analysis means of the 1st Embodiment of this invention. 本発明におけるハードウェアとソフトウェアの協働を示すシーケンス図である。It is a sequence diagram which shows cooperation of the hardware and software in this invention. 従来の係り受け解析の複数文に対する処理結果の一例を示す図である。It is a figure which shows an example of the processing result with respect to the several sentence of the conventional dependency analysis. 本発明の第１の実施形態の統合係り受け解析手段の複数文に対する処理結果の一例を示す図である。It is a figure which shows an example of the process result with respect to the multiple sentence of the integrated dependency analysis means of the 1st Embodiment of this invention. 図５Ａの例において同定される係り受け関係を説明する図である。It is a figure explaining the dependency relation identified in the example of FIG. 5A. 図５Ｂの例において同定される係り受け関係を説明する図である。It is a figure explaining the dependency relationship identified in the example of FIG. 5B. 従来の係り受け解析の一文に対する処理結果の一例を示す図である。It is a figure which shows an example of the processing result with respect to one sentence of the conventional dependency analysis. 本発明の第１の実施形態の統合係り受け解析手段の一文に対する処理結果の一例を示す図である。It is a figure which shows an example of the process result with respect to the one sentence of the integrated dependency analysis means of the 1st Embodiment of this invention. 図７Ａの例において同定される係り受け関係を説明する図である。It is a figure explaining the dependency relation identified in the example of FIG. 7A. 図７Ｂの例において同定される係り受け関係を説明する図である。It is a figure explaining the dependency relationship identified in the example of FIG. 7B. 本発明の第２の実施形態の統合係り受け解析手段の詳細構成を示す図である。It is a figure which shows the detailed structure of the integrated dependency analysis means of the 2nd Embodiment of this invention. 本発明の第２の実施形態の統合係り受け解析手段の複数文に対する処理結果の一例を示す図である。It is a figure which shows an example of the process result with respect to the multiple sentence of the integrated dependency analysis means of the 2nd Embodiment of this invention. 図１０の例において同定される係り受け関係を説明する図である。It is a figure explaining the dependency relation identified in the example of FIG. 本発明の第２の実施形態の統合係り受け解析手段の一文に対する処理結果の一例を示す図である。It is a figure which shows an example of the process result with respect to one sentence of the integrated dependency analysis means of the 2nd Embodiment of this invention. 図１２の例において同定される係り受け関係を説明する図である。It is a figure explaining the dependency relation identified in the example of FIG.

以下、本発明の実施の形態について、図面を参照して説明する。 Embodiments of the present invention will be described below with reference to the drawings.

（第１の実施形態）
まず、本発明の第１の実施形態について説明する。 (First embodiment)
First, a first embodiment of the present invention will be described.

図１は、本発明の第１の実施形態の文構造解析装置１の概略構成を示す図である。文構造解析装置１は、それぞれバス３０で相互に接続されたメモリ装置１１、演算処理装置１２、インターフェース装置１３、補助記憶装置１４、入力装置１５、出力装置１６を備えるコンピュータ装置である。 FIG. 1 is a diagram showing a schematic configuration of a sentence structure analyzing apparatus 1 according to the first embodiment of the present invention. The sentence structure analysis device 1 is a computer device that includes a memory device 11, an arithmetic processing device 12, an interface device 13, an auxiliary storage device 14, an input device 15, and an output device 16 that are mutually connected by a bus 30.

メモリ装置１１は、文構造解析装置１の起動時に補助記憶装置１４に記憶された文構造解析プログラム２０等のプログラムを読み出して記憶するＲＡＭ（Random Access Memory）等の記憶装置である。このメモリ装置１１は、文構造解析プログラム２０等の実行に必要なファイル、データ等も記憶する。演算処理装置１２は、メモリ装置１１に格納されたプログラムを実行するＣＰＵ（Central Processing Unit）等の演算処理装置である。インターフェース装置１３は、外部ネットワーク等に接続するためのインターフェース装置である。補助記憶装置１４は、文構造解析プログラム２０やファイル、データ等を記憶するＨＤＤ等の記憶装置である。入力装置１５は、ユーザインターフェースを提供する入力装置（例えばキーボード、マウス）である。出力装置１６は、ユーザインターフェースを提供する出力装置（例えばディスプレイ）である。 The memory device 11 is a storage device such as a RAM (Random Access Memory) that reads and stores a program such as the sentence structure analysis program 20 stored in the auxiliary storage device 14 when the sentence structure analysis device 1 is activated. The memory device 11 also stores files, data, and the like necessary for executing the sentence structure analysis program 20 and the like. The arithmetic processing device 12 is an arithmetic processing device such as a CPU (Central Processing Unit) that executes a program stored in the memory device 11. The interface device 13 is an interface device for connecting to an external network or the like. The auxiliary storage device 14 is a storage device such as an HDD that stores the sentence structure analysis program 20, files, data, and the like. The input device 15 is an input device (for example, a keyboard or a mouse) that provides a user interface. The output device 16 is an output device (for example, a display) that provides a user interface.

文構造解析プログラム２０は、解析要求入力手段（解析要求入力部）２１、形態素解析手段（形態素解析部）２２、文節解析手段（文節解析部）２３、統合係り受け解析手段（統合係り受け解析部）２４、解析結果表示手段（解析結果表示部）２５を含む。 The sentence structure analysis program 20 includes an analysis request input unit (analysis request input unit) 21, a morpheme analysis unit (morpheme analysis unit) 22, a clause analysis unit (phrase analysis unit) 23, an integrated dependency analysis unit (integrated dependency analysis unit). 24) and an analysis result display means (analysis result display unit) 25.

解析要求入力手段２１は、利用者が入力装置１５（キーボード等）を用いて入力した解析対象のテキストを入力する。入力されるテキストは一文であってもよいし、複数文であってもよい。 The analysis request input means 21 inputs the text to be analyzed input by the user using the input device 15 (keyboard or the like). The input text may be a single sentence or a plurality of sentences.

形態素解析手段２２は、解析要求入力手段２１によって入力されたテキストに対して、形態素解析を実行する。形態素解析とは、入力されたテキスト（文字列）を単語に区切り、品詞を付与する処理である。 The morpheme analysis unit 22 performs morpheme analysis on the text input by the analysis request input unit 21. The morphological analysis is a process of dividing the input text (character string) into words and adding parts of speech.

図２は、本発明の第１の実施形態の形態素解析手段２２の処理結果の一例を示す図である。図２では、入力されたテキストが「太郎が学校から帰ってきた。すぐに外出した。」である場合を例に、形態素解析手段２２の処理結果を示す。形態素解析手段２２が実行する形態素解析は、オープンソースで公開されている形態素解析システム茶イ（http://chasen.naist.jp/hiki/ChaSen/）等の既存の手法によって実現可能である。 FIG. 2 is a diagram illustrating an example of a processing result of the morphological analysis unit 22 according to the first embodiment of this invention. In FIG. 2, the processing result of the morphological analysis means 22 is shown by taking as an example a case where the input text is “Taro has returned from school. The morpheme analysis executed by the morpheme analysis means 22 can be realized by an existing method such as the open source morpheme analysis system Chai (http://chasen.naist.jp/hiki/ChaSen/).

図１に戻り、文節解析手段２３は、形態素解析手段２３によって単語に区切られたテキスト、すなわち単語列に基づいて、文節列を同定する。 Returning to FIG. 1, the phrase analysis unit 23 identifies a phrase string based on the text divided into words by the morpheme analysis unit 23, that is, the word string.

例えば図２の例では、単語列（「太郎」、「が」、「学校」、「から」、「帰っ」、「て」、「き」、「た」、「すぐ」、「に」、「外出」、「し」、「た」）に基づいて、（「太郎が」、「学校から」、「帰ってきた」、「すぐに」、「外出した」）という文節列が同定される。文節解析手段２３の処理結果として得られる文節列は、統合係り受け解析手段２４に送られる。同様に、各文節を構成する単語の情報も統合係り受け解析手段２４に送られる。文節解析手段２３が実行する文節解析は、オープンソースで公開されているチャンキングプログラムYamCha（http://chasen.org/~taku/software/YamCha/）等の既存の手法によって実現可能である。 For example, in the example of FIG. 2, a word string (“Taro”, “ga”, “school”, “kara”, “return”, “te”, “ki”, “ta”, “immediately”, “ni”, ("Taro", "Shi", "Ta") ("Taro is", "From school", "Returned", "Immediately", "I went out") are identified . The phrase string obtained as the processing result of the phrase analysis means 23 is sent to the integrated dependency analysis means 24. Similarly, information on the words constituting each phrase is also sent to the integrated dependency analysis means 24. The phrase analysis executed by the phrase analysis means 23 can be realized by an existing method such as a chunking program YamCha (http://chasen.org/~taku/software/YamCha/) released as an open source.

統合係り受け解析手段２４は、文節解析手段２３から受け取った文節列に基づいて、文節間の係り受け関係を同定する。この統合係り受け解析手段２４は、従来の係り受け解析によって同定される係り受け関係（表層の係り受け関係）に加えて、ゼロ代名詞の係り先となる述語と先行詞の関係（暗黙の係り受け関係）を同定する。すなわち、表層の係り受け関係と暗黙の係り受け関係とを、統合的に解析する。 The integrated dependency analysis unit 24 identifies a dependency relationship between phrases based on the phrase string received from the phrase analysis unit 23. In addition to the dependency relationship (surface dependency relationship) identified by the conventional dependency analysis, the integrated dependency analysis means 24 includes the relationship between the predicate and the antecedent of the zero pronoun dependency (implicit dependency relationship). Identification). That is, the surface dependency relationship and the implicit dependency relationship are analyzed in an integrated manner.

解析結果表示手段２５は、統合係り受け解析手段２４による解析結果を、出力装置１６（ディスプレイ等）に表示するためのデータを生成する及び表示する。表示例については後述する。 The analysis result display unit 25 generates and displays data for displaying the analysis result by the integrated dependency analysis unit 24 on the output device 16 (display or the like). A display example will be described later.

図３は、本発明の第１の実施形態の統合係り受け解析手段２４の詳細構成を示す図である。統合係り受け解析手段２４は、文節ペア選択手段（文節ペア選択部）４０１、係り受けスコア計算手段（係り受けスコア計算部）４０２、係り受け関係選択手段（係り受け関係選択部）４０３を含む。 FIG. 3 is a diagram illustrating a detailed configuration of the integrated dependency analysis unit 24 according to the first embodiment of this invention. The integrated dependency analysis unit 24 includes a phrase pair selection unit (a phrase pair selection unit) 401, a dependency score calculation unit (a dependency score calculation unit) 402, and a dependency relationship selection unit (a dependency relationship selection unit) 403.

文節ペア選択手段４０１は、文節解析手段２３から受け取った文節列から、任意の方法で文節ペアを選択する。例えば、文頭の文節番号を１、文末の文節番号をＮとした場合、はじめに、１と２、次に１と３、・・・、１とＮ、続いて、２と３、・・・、２とＮ、・・・、Ｎ−１とＮ、の順に文節ペアを選択する。なお、このように文節例の前から順に文節ペアを選択してもよいが、文節列の後ろから順に文節ペアを選択してもよい。特に、文節列の後ろから順に文節ペアを選択する場合には、前から順に選択した場合に係り受け関係の存在する文節ペアのみ選択してもよい（例えば図６Ｂの例では、「本を」と「読んだ」の文節ペア）。 The phrase pair selection means 401 selects a phrase pair from the phrase string received from the phrase analysis means 23 by an arbitrary method. For example, if the sentence number at the beginning of the sentence is 1 and the sentence number at the end of the sentence is N, first, 1 and 2, then 1 and 3,..., 1 and N, then 2 and 3,. The phrase pairs are selected in the order of 2 and N,..., N−1 and N. In this way, the phrase pairs may be selected in order from the front of the phrase example, but the phrase pairs may be selected in order from the rear of the phrase string. In particular, when selecting a phrase pair in order from the back of the phrase string, only a phrase pair having a dependency relationship when selecting in order from the front may be selected (for example, “book” in the example of FIG. 6B). And “read” clause pair).

係り受けスコア計算手段４０２は、文節ペア選択手段４０１によって選択された文節ペアのそれぞれに対して、係り受けスコアを計算する。係り受けスコアとは、係り受け関係のもっともらしさ（尤度）を示す数値指標である。すなわち、文節ペア選択手段４０１は、サポートベクターマシーンや決定木等の機械学習手法の結果として得られる係り受け関係の統計モデルに基づいて、各文節ペアの係り受けスコアを計算する。統計モデルは、各文節の付随情報（文字列、品詞名等）や文節間の距離、文節間にある別の文節の付随情報等を素性として機械学習手法を適用することによって作成される。例えば、係り受け関係にある文節に対しては、その素性集合に対して「＋１」という正例を示すラベル、係り受け関係にない文節間に対しては、その素性集合に対して「−１」という負例を示すラベルを付与したものを機械学習プログラムへ入力し統計モデルを作成する。ある文節ペアの係り受けスコアを計算する場合、その文節ペアがもっている素性集合を機会学習プログラムへ入力すると、係り受け関係に応じたスコア（スコアの値は使用する機械学習アルゴリズムによって異なる）が得られる。 The dependency score calculation unit 402 calculates a dependency score for each of the phrase pairs selected by the phrase pair selection unit 401. The dependency score is a numerical index indicating the plausibility (likelihood) of the dependency relationship. That is, the phrase pair selection unit 401 calculates a dependency score of each phrase pair based on a statistical model of dependency relations obtained as a result of a machine learning technique such as a support vector machine or a decision tree. The statistical model is created by applying a machine learning method with the accompanying information (character string, part-of-speech name, etc.) of each clause, the distance between clauses, the accompanying information of another clause between clauses, etc. as features. For example, for a clause having a dependency relationship, a label indicating a positive example “+1” for the feature set, and for a clause having no dependency relationship, “−1” for the feature set. The thing which gave the label which shows a negative example "is input into a machine learning program, and a statistical model is created. When calculating the dependency score of a phrase pair, if the feature set of the phrase pair is input to the opportunity learning program, a score corresponding to the dependency relationship (the value of the score varies depending on the machine learning algorithm used) is obtained. It is done.

係り受け関係選択手段４０３は、係り受けスコア計算手段４０２で計算された各文節ペアの係り受けスコアに基づいて、予め定められた閾値を超える係り受けスコアを持つ係り受け関係（文節ペア）を選択する。予め定められた閾値を超える係り受けスコアを持つ係り受け関係が複数存在する場合は、それらすべての係り受け関係を選択する。 The dependency relationship selection unit 403 selects a dependency relationship (phrase pair) having a dependency score exceeding a predetermined threshold based on the dependency score of each phrase pair calculated by the dependency score calculation unit 402. To do. When there are a plurality of dependency relationships having dependency scores exceeding a predetermined threshold value, all the dependency relationships are selected.

以上の構成により、統合係り受け解析手段２４は、文節解析手段２３から受け取った文節列に基づいて、高い係り受けスコアを持つ係り受け関係を選択する。 With the above configuration, the integrated dependency analysis unit 24 selects a dependency relationship having a high dependency score based on the phrase string received from the phrase analysis unit 23.

図４は、本発明におけるハードウェアとソフトウェアの協働を示すシーケンス図である。利用者が入力装置１５に入力した文字列は文構造解析プログラム２０に送られる（Ｔ１）。文構造解析手段２０に入力された文字列は、上述のとおり、解析要求入力手段１０、形態素解析手段２０、文節解析手段３０で処理され、その結果得られる文節列が文節ペア選択手段４０１に送られる。文節ペア選択手段４０１で選択された各文節ペアに対して、係り受けスコア計算手段４０２は、係り受け統計モデル４０５を用いて係り受けスコアを計算する（Ｔ２）。この処理を文節ペア選択手段４０１で選択される文節がなくなるまで繰り返す（Ｔ３）。結果として得られたスコア付きの係り受けペアから係り受け関係選択手段４０３が係り受け関係を選択し、その結果を解析結果表示手段２５が出力装置１６の送り返す（Ｔ４）。 FIG. 4 is a sequence diagram showing cooperation between hardware and software in the present invention. The character string input to the input device 15 by the user is sent to the sentence structure analysis program 20 (T1). As described above, the character string input to the sentence structure analysis unit 20 is processed by the analysis request input unit 10, the morpheme analysis unit 20, and the phrase analysis unit 30, and the resulting phrase string is sent to the phrase pair selection unit 401. It is done. For each phrase pair selected by the phrase pair selection means 401, the dependency score calculation means 402 calculates a dependency score using the dependency statistical model 405 (T2). This process is repeated until there are no more phrases selected by the phrase pair selection unit 401 (T3). The dependency relationship selection means 403 selects a dependency relationship from the resulting dependency pairs with scores, and the analysis result display means 25 sends back the result to the output device 16 (T4).

以下、統合係り受け解析手段２４の処理の流れを従来の係り受け解析の処理の流れと比較しながら説明する。 The processing flow of the integrated dependency analysis means 24 will be described below in comparison with the conventional dependency analysis processing flow.

図５Ａは、従来の係り受け解析の複数文に対する処理結果の一例を示す図である。図５Ｂは、本発明の第１の実施形態の統合係り受け解析手段２４の複数文に対する処理結果の一例を示す図である。図５Ａ及び図５Ｂでは、複数文（この例では２文）に含まれる文節列（「太郎が」、「学校から」、「帰ってきた」、「すぐに」、「外出した」）に対する係り受け解析の処理結果を示す。 FIG. 5A is a diagram illustrating an example of processing results for a plurality of sentences in a conventional dependency analysis. FIG. 5B is a diagram illustrating an example of processing results for a plurality of sentences by the integrated dependency analysis unit 24 according to the first embodiment of this invention. In FIG. 5A and FIG. 5B, the relation to the phrase sequence (“Taro ga”, “From school”, “Returned”, “Immediately”, “I went out”) included in multiple sentences (two sentences in this example) The processing result of receiving analysis is shown.

従来の係り受け解析（図５Ａ）では、文節列（「太郎が」、「学校から」、「帰ってきた」）が与えられると、「太郎が」と「学校から」が「帰ってきた」に係る、という係り受け関係が同定される。その後、文節列（「すぐに」、「外出した」）が与えられると、「すぐに」が「外出した」に係る、という係り受け関係が同定される。すなわち、従来の係り受け解析では、一文単位で係り受け関係が同定される。また、従来の係り受け解析は、表層の係り受け関係のみを同定する。 In the conventional dependency analysis (FIG. 5A), when a phrase string (“Taro is”, “from school”, “returned”) is given, “Taro is” and “from school” are “returned”. A dependency relationship is identified. Thereafter, when a phrase string (“immediately”, “going out”) is given, a dependency relationship that “immediately” relates to “going out” is identified. That is, in the conventional dependency analysis, the dependency relationship is identified in units of one sentence. The conventional dependency analysis identifies only the dependency relationship of the surface layer.

これに対して、統合係り受け解析手段２４（図５Ｂ）では、図３を用いて説明した処理を実行することによって、上述の係り受け関係に加えて、「太郎が」が「外出した」に係る、という係り受け関係が同定される。すなわち、統合係り受け解析手段２４は、一文内に係り受け関係を限定することなく、複数文単位で係り受け関係を同定する。また、統合係り受け解析手段２４は、表層の係り受け関係だけでなく、暗黙の係り受け関係、すなわちゼロ代名詞の係り先となる述語（この例では「外出した」）と先行詞（この例では「太郎が」）の係り受け関係を同定している。 On the other hand, in the integrated dependency analysis unit 24 (FIG. 5B), “Taro” has gone “out” in addition to the above dependency relationship by executing the processing described with reference to FIG. A dependency relationship is identified. That is, the integrated dependency analysis unit 24 identifies the dependency relationship in units of a plurality of sentences without limiting the dependency relationship within one sentence. Further, the integrated dependency analysis means 24 is not only a surface dependency relationship but also an implicit dependency relationship, that is, a predicate (in this example, “goes out”) and an antecedent (in this example, “going out”). "Taro ga") dependency relationship has been identified.

図６Ａは、図５Ａの例において同定される係り受け関係を説明する図である。図６Ｂは、図５Ｂの例において同定される係り受け関係を説明する図である。図６Ａ及び図６Ｂでは、係り元の文節（縦）と係り先の文節（横）との係り受け関係を、行列形式で可視化している。図６Ｂに示すようなテーブルが、解析結果表示手段２５によって出力装置１６（ディスプレイ等）に表示される。なお、表示態様は、図６に示すようなテーブルに限定されるものではない。 FIG. 6A is a diagram illustrating a dependency relationship identified in the example of FIG. 5A. FIG. 6B is a diagram illustrating a dependency relationship identified in the example of FIG. 5B. In FIG. 6A and FIG. 6B, the dependency relationship between the source clause (vertical) and the target clause (horizontal) is visualized in a matrix format. A table as shown in FIG. 6B is displayed on the output device 16 (display or the like) by the analysis result display means 25. The display mode is not limited to the table as shown in FIG.

従来の係り受け解析では、図６Ａに示すように、「太郎が」と「帰ってきた」、「学校から」と「帰ってきた」、「すぐに」と「外出した」、という３つの係り受け関係（図中○の部分）が同定される。 In the conventional dependency analysis, as shown in FIG. 6A, there are three relationships, “Taro is” and “returned”, “from school” and “returned”, “immediately” and “going out”. The receiving relationship (circled in the figure) is identified.

これに対して、統合係り受け解析手段２４では、図６Ｂに示すように、上述の係り受け関係に加えて、「太郎が」と「外出した」という係り受け関係（図中◎の部分）が同定される。なお、図中◎の部分で示される係り受け関係は、従来であれば、係り受け解析の後に、格フレーム解析と照応解析を実行することによって同定可能な暗黙の係り受け関係である。言い換えると、統合係り受け解析手段２４は、図３を用いて説明した処理を実行することによって、格フレーム解析及び照応解析を実行することなく、暗黙の係り受け関係を同定することができる。 On the other hand, in the integrated dependency analysis means 24, as shown in FIG. 6B, in addition to the above-described dependency relationship, a dependency relationship (indicated by ◎ in the figure) that “Taro has gone” and “goes out” is provided. Identified. It should be noted that the dependency relationship indicated by the ◎ in the figure is an implicit dependency relationship that can be identified by executing case frame analysis and anaphora analysis after dependency analysis. In other words, the integrated dependency analysis unit 24 can identify the implicit dependency relationship by executing the processing described with reference to FIG. 3 without executing the case frame analysis and the anaphora analysis.

図７Ａは、従来の係り受け解析の一文に対する処理結果の一例を示す図である。図７Ｂは、本発明の第１の実施形態の統合係り受け解析手段２４の一文に対する処理結果の一例を示す図である。図７Ａ及び図７Ｂでは、一文に含まれる文節列（「私は」、「昨日」、「東京で」、「買った」、「本を」、「読んだ」）に対する処理結果を示す。 FIG. 7A is a diagram illustrating an example of a processing result for one sentence of conventional dependency analysis. FIG. 7B is a diagram illustrating an example of a processing result for one sentence of the integrated dependency analyzing unit 24 according to the first embodiment of this invention. FIG. 7A and FIG. 7B show the processing results for the phrase strings (“I am”, “Yesterday”, “In Tokyo”, “Bought”, “Book”, “Read”) included in one sentence.

従来の係り受け解析（図７Ａ）では、文節列（「私は」、「昨日」、「東京で」、「買った」、「本を」、「読んだ」）が与えられると、「私は」と「本を」が「読んだ」に係り、「昨日」と「東京で」が「買った」に係り、「買った」が「本を」に係る、という係り受け関係が同定される。すなわち、従来の係り受け解析では、一文単位で係り受け関係が同定される。また、従来の係り受け解析は、表層の係り受け関係のみを同定する。しかしながら、「買った」のガ格及びヲ格については同定されておらず、照応解析を実行することによって同定する必要がある。 In the conventional dependency analysis (Fig. 7A), given a phrase string ("I am", "Yesterday", "In Tokyo", "Bought", "Book", "Read"), Dependent relationship is identified that “ha” and “book” are related to “read”, “yesterday” and “in Tokyo” are related to “buy”, and “buy” are related to “book”. The That is, in the conventional dependency analysis, the dependency relationship is identified in units of one sentence. The conventional dependency analysis identifies only the dependency relationship of the surface layer. However, the "purchased" ga and wo cases have not been identified and must be identified by performing an anaphora analysis.

これに対して、統合係り受け解析手段２４（図７Ｂ）では、図３を用いて説明した処理を実行することによって、上述の係り受け関係に加えて、「私は」が「買った」に係り、「本を」が「買った」に係る、という係り受け関係が同定される。すなわち、統合係り受け解析手段２４は、一文内の係り受け関係においても、表層の係り受け関係だけでなく、暗黙の係り受け関係、すなわちゼロ代名詞の係り先となる述語（この例では「買った」）と先行詞（この例では、「私は」、「本を」）の係り受け関係を同定している。 On the other hand, in the integrated dependency analysis means 24 (FIG. 7B), by executing the processing described with reference to FIG. 3, in addition to the dependency relationship described above, “I am” “buy”. The dependency relationship “the book” is related to “buy” is identified. In other words, the integrated dependency analysis means 24 is not only a surface dependency relationship but also an implicit dependency relationship, that is, a predicate that is a dependency of a zero pronoun (in this example, “buyed” )) And antecedents (in this example, “I am”, “Book”)).

図８Ａは、図７Ａの例において同定される係り受け関係を説明する図である。図８Ｂは、図７Ｂの例において同定される係り受け関係を説明する図である。図８Ａ及び図８Ｂでは、係り元の文節（縦）と係り先の文節（横）との係り受け関係を、行列形式で可視化している。図８Ｂに示すようなテーブルが、解析結果表示手段２５によって出力装置１６（ディスプレイ等）に表示される。なお、表示態様は、図８に示すようなテーブルに限定されるものではない。 FIG. 8A is a diagram illustrating a dependency relationship identified in the example of FIG. 7A. FIG. 8B is a diagram illustrating a dependency relationship identified in the example of FIG. 7B. In FIG. 8A and FIG. 8B, the dependency relationship between the source clause (vertical) and the target clause (horizontal) is visualized in a matrix format. A table as shown in FIG. 8B is displayed on the output device 16 (display, etc.) by the analysis result display means 25. The display mode is not limited to the table as shown in FIG.

従来の係り受け解析では、図８Ａに示すように、「私は」と「読んだ」、「昨日」と「買った」、「東京で」と「買った」、「買った」と「本を」、「本を」と「読んだ」、という５つの係り受け関係（図中○の部分）が同定される。 In the conventional dependency analysis, as shown in FIG. 8A, “I read”, “Yesterday” and “Bought”, “In Tokyo” and “Bought”, “Bought” and “Book” ”,“ Book ”and“ Read ”are identified.

これに対して、統合係り受け解析手段２４では、図８Ｂに示すように、上述の係り受け関係に加えて、「私は」と「買った」、「本を」と「買った」という係り受け関係（図中◎の部分）が同定される。なお、図中◎の部分で示される係り受け関係は、従来であれば、係り受け解析の後に、格フレーム解析と照応解析を実行することによって同定可能な暗黙の係り受け関係である。言い換えると、統合係り受け解析手段２４は、図３を用いて説明した処理を実行することによって、格フレーム解析及び照応解析を実行することなく、暗黙の係り受け関係を同定することができる。 On the other hand, in the integrated dependency analysis means 24, as shown in FIG. 8B, in addition to the above-described dependency relationship, “I bought”, “I bought”, and “I bought a book” The receiving relationship (the portion marked with ◎ in the figure) is identified. It should be noted that the dependency relationship indicated by the ◎ in the figure is an implicit dependency relationship that can be identified by executing case frame analysis and anaphora analysis after dependency analysis. In other words, the integrated dependency analysis unit 24 can identify the implicit dependency relationship by executing the processing described with reference to FIG. 3 without executing the case frame analysis and the anaphora analysis.

従来の係り受け解析においては、前述のように、統計的係り受け解析が主流である（非特許文献１参照）。統計的係り受け解析とは、係り受け関係が付与されたテキストから、機械学習手法によって係り受け関係の統計モデルを学習し、学習結果を用いて曖昧性を解消しながら係り受けを解析する手法である。代表的な機械学習手法には、サポートベクターマシーンや決定木などが挙げられる。図５Ａに示す例では、「太郎が」の係り先候補として、「学校から」と「帰ってきた」があるが、統計モデルによる判定によって、それぞれの係り受け関係（「太郎が」と「学校から」、「太郎が」と「帰ってきた」）のもっともらしさ（係り受けスコア）を計算する。その後、もっともらしさの大きい係り受け関係を選択する。統計モデルは、「太郎が」や「学校から」が有する様々な情報（文字列、品詞名、助詞の種類、文節間の距離など）を素性として、機械学習手法を適用することによって作成することができる。 In conventional dependency analysis, as described above, statistical dependency analysis is the mainstream (see Non-Patent Document 1). Statistical dependency analysis is a method of learning a dependency relationship statistical model from a text with a dependency relationship using a machine learning method and analyzing the dependency while solving the ambiguity using the learning result. is there. Typical machine learning techniques include support vector machines and decision trees. In the example shown in FIG. 5A, there are “from Taro” and “returned” as possible candidates for “Taro ga”, but depending on the determination by the statistical model, each dependency relationship (“Taro ga” and “School” ”,“ Taro ga ”and“ Come back ”), the plausibility (dependency score) is calculated. After that, the most likely dependency relationship is selected. A statistical model should be created by applying machine learning techniques with various types of information (such as character strings, part of speech names, particle types, and distances between phrases) held by “Taro ga” and “from school” as features. Can do.

統合係り受け解析手段２４による統合係り受け解析においては、従来の係り受け解析と同様に、係り受け関係が付与されたテキストから、機械学習手法によって係り受け関係の統計モデルを学習し、学習結果を用いて曖昧性を解消しながら係り受けを解析する。従来の係り受け解析との相違点は、以下の通りである。すなわち、テキストに付与される係り受け関係は、暗黙の係り受け関係を含む。従来の係り受け解析では、係り先に曖昧性が存在する場合、最ももっともらしい（係り受けスコアの高い）係り先を選択する。また、考慮される係り受け関係は、一文内の係り受け関係のみである。一方、統合係り受け解析手段２４による係り受け解析では、考慮される係り受け関係は、一文内の係り受け関係に限定されず、複数文内の係り受け関係である。また、選択する係り先は一つに限定しないで、もっともらしい係り先を複数選択する。なお、複数の係り先を選択する場合、以下のようにして選択することができる。すなわち、まず、サポートベクターマシーンや決定木などの機械学習手法によって構築された係り受け関係の統計モデルに基づいて、解析対象の係り受け関係の各々について係り受けスコアを計算する。その後、係り受けスコアが所定の閾値以上の係り受け関係を選択する、又は、係り受けスコアが最も高い係り受け関係との差異が、所定の範囲内（絶対値又は相対値など）である係り受け関係を選択する。 In the integrated dependency analysis by the integrated dependency analysis means 24, the statistical model of the dependency relationship is learned from the text to which the dependency relationship is given by a machine learning method as in the conventional dependency analysis, and the learning result is obtained. Use it to analyze dependencies while resolving ambiguity. Differences from the conventional dependency analysis are as follows. That is, the dependency relationship given to the text includes an implicit dependency relationship. In the conventional dependency analysis, if there is ambiguity in the dependency destination, the most likely dependency (high dependency score) is selected. Also, the dependency relationship considered is only the dependency relationship in one sentence. On the other hand, in the dependency analysis by the integrated dependency analysis means 24, the dependency relationship to be considered is not limited to the dependency relationship in one sentence, but is a dependency relationship in a plurality of sentences. In addition, the number of destinations to be selected is not limited to one, and a plurality of plausible destinations are selected. In addition, when selecting a plurality of relations, it can be selected as follows. That is, first, a dependency score is calculated for each dependency relationship to be analyzed based on a dependency relationship statistical model constructed by a machine learning method such as a support vector machine or a decision tree. Then, select a dependency relationship whose dependency score is equal to or higher than a predetermined threshold value, or a dependency whose difference from the dependency relationship having the highest dependency score is within a predetermined range (such as an absolute value or a relative value). Select a relationship.

以上説明してきた本発明の第１の実施形態によれば、述語項構造解析において、大規模かつ高精度な格フレーム辞書を利用することなく、述語項構造における情報の欠落を効率的に補完しながら、述語項構造を同定している。そのため、モデルの複雑さを回避し、計算効率を高めることができる。また、高精度な情報検索や情報抽出を実現することが可能となる。 According to the first embodiment of the present invention described above, in the predicate term structure analysis, the lack of information in the predicate term structure is efficiently complemented without using a large-scale and high-precision case frame dictionary. However, the predicate term structure is identified. Therefore, the complexity of the model can be avoided and the calculation efficiency can be increased. In addition, highly accurate information retrieval and information extraction can be realized.

（第２の実施形態）
次に、本発明の第２の実施形態について説明する。 (Second Embodiment)
Next, a second embodiment of the present invention will be described.

図９は、本発明の第２の実施形態の統合係り受け解析手段２４の詳細構成を示す図である。統合係り受け解析手段２４は、文節ペア選択手段（文節ペア選択部）４０１、係り受けスコア計算手段（係り受けスコア計算部）４０２、係り受け関係選択手段（係り受け関係選択部）４０３、先行詞生成手段（先行詞生成部）４０４を含む。図９において、図３と同一の構成要素には同一の符号を付して重複する説明を適宜省略する。 FIG. 9 is a diagram showing a detailed configuration of the integrated dependency analysis unit 24 according to the second exemplary embodiment of the present invention. The integrated dependency analysis unit 24 includes a phrase pair selection unit (a phrase pair selection unit) 401, a dependency score calculation unit (a dependency score calculation unit) 402, a dependency relationship selection unit (a dependency relationship selection unit) 403, and an antecedent. A generation unit (antecedent generation unit) 404 is included. In FIG. 9, the same components as those in FIG.

先行詞生成手段４０４は、文節解析手段２３から受け取った文節列に含まれる各文節について、当該文節がゼロ代名詞の先行詞になりやすいか否かを判定し、判定結果に応じて先行詞を生成する。 The antecedent generation unit 404 determines, for each phrase included in the phrase sequence received from the phrase analysis unit 23, whether or not the phrase is likely to be an antecedent of a zero pronoun, and generates an antecedent according to the determination result To do.

具体的には先行詞生成手段４０４は、まず各文節について、ゼロ代名詞の先行詞へのなりやすさの度合いを計算する。ゼロ代名詞の先行詞へのなりやすさの度合いは、ゼロ代名詞の先行詞となっている文節に対して、その情報が付与されたテキストから、サポートベクターマシーンや決定木などの機械学習手法によって先行詞生成の統計モデルを学習し、学習結果を用いて計算される。ゼロ代名詞の先行詞になりやすいほど、当該度合いは大きい値となる。その後、計算されたゼロ代名詞の先行詞へのなりやすさの度合いが所定の閾値を超えた場合、当該文節に基づいて先行詞を生成する。生成された先行詞は、文節解析手段２３から受け取った文節列に含まれる他の各文節と共に、文節ペア選択手段４０１に送信される。 Specifically, the antecedent generation unit 404 first calculates the degree of the likelihood that the zero pronoun becomes an antecedent for each phrase. The degree of ease of becoming a predecessor of a zero pronoun is preceded by a machine learning technique such as a support vector machine or a decision tree from the text with the information added to the phrase that is the antecedent of the zero pronoun. A statistical model for generating words is learned and calculated using the learning results. The greater the antecedent of a zero pronoun, the greater the degree. Thereafter, if the calculated degree of proneness of a zero pronoun to an antecedent exceeds a predetermined threshold, an antecedent is generated based on the phrase. The generated antecedent is transmitted to the phrase pair selection unit 401 together with other phrases included in the phrase string received from the phrase analysis unit 23.

以上の構成により、統合係り受け解析手段２４は、先行詞生成手段４０３によって生成された先行詞と文節解析手段２３から受け取った文節列とに基づいて、高い係り受けスコアを持つ係り受け関係を選択する。 With the above configuration, the integrated dependency analysis unit 24 selects a dependency relationship having a high dependency score based on the antecedent generated by the antecedent generation unit 403 and the phrase string received from the phrase analysis unit 23. To do.

図１０は、本発明の第２の実施形態の統合係り受け解析手段２４の複数文に対する処理結果の一例を示す図である。図１０では、複数文（この例では２文）に含まれる文節列（「太郎が」、「学校から」、「帰ってきた」、「すぐに」、「外出した」）に対する処理結果を示す。 FIG. 10 is a diagram illustrating an example of processing results for a plurality of sentences by the integrated dependency analysis unit 24 according to the second embodiment of this invention. FIG. 10 shows processing results for phrase strings (“Taro ga”, “from school”, “returned”, “immediately”, “going out”) included in a plurality of sentences (two sentences in this example). .

図１０に示す例では、先行詞生成手段４０４は、図９を用いて説明した処理を実行することによって、文節「太郎が」が先行詞になりやすいと判定し、先行詞「太郎は」を生成する。生成された先行詞「太郎は」は、文節列に含まれる他の各文節（「太郎が」、「学校から」、「帰ってきた」、「すぐに」、「外出した」）と共に、文節ペア選択手段４０１に送信される。 In the example shown in FIG. 10, the antecedent generation unit 404 determines that the phrase “Taro ga” is likely to become an antecedent by executing the processing described with reference to FIG. Generate. The generated antecedent “Taroha” is a phrase along with each other clause (“Taro is”, “From school”, “Returned”, “Immediately”, “I went out”) included in the phrase string. It is transmitted to the pair selection means 401.

文節ペア選択手段４０１及び文節係り受けスコア計算手段４０２は、前述の第１の実施形態（図３参照）と同様に機能する。一方、係り受け関係選択手段４０３は、前述の第１の実施形態と異なり、一つの係り受け関係を選択する。例えば、最も高い係り受けスコアを持つ係り受け関係を選択する。これにより、従来の係り受け解析手法で用いられる効率の良いアルゴリズムを利用することが可能となる。 The phrase pair selection unit 401 and the phrase dependency score calculation unit 402 function in the same manner as in the first embodiment (see FIG. 3). On the other hand, the dependency relationship selection unit 403 selects one dependency relationship unlike the first embodiment described above. For example, the dependency relationship having the highest dependency score is selected. Thereby, it is possible to use an efficient algorithm used in the conventional dependency analysis method.

図１１は、図１０の例において同定される係り受け関係を示す図である。図１１では、係り元の文節（縦）と係り先の文節（横）との係り受け関係を、行列形式で可視化している。図１１に示すようなテーブルが、解析結果表示手段２５によって出力装置１６（ディスプレイ等）に表示される。なお、表示態様は、図１１に示すようなテーブルに限定されるものではない。図１１において、下線付きの文字で示される「太郎は」は、先行詞生成手段４０４によって生成された先行詞である。 FIG. 11 is a diagram illustrating a dependency relationship identified in the example of FIG. In FIG. 11, the dependency relationship between the source clause (vertical) and the destination clause (horizontal) is visualized in a matrix format. A table as shown in FIG. 11 is displayed on the output device 16 (display or the like) by the analysis result display means 25. The display mode is not limited to the table as shown in FIG. In FIG. 11, “Taroha” indicated by an underlined character is an antecedent generated by the antecedent generating means 404.

統合係り受け解析手段２４では、図１１に示すように、係り元の文節（縦）の各々について、高々１つの係り先の文節（横）との係り受け関係（図中○、◎の部分）が同定している。 In the integrated dependency analysis means 24, as shown in FIG. 11, each dependency clause (vertical) has a dependency relationship with at most one dependency clause (horizontal) (the portions marked with ○ and ◎ in the figure). Has identified.

図１２は、本発明の第２の実施形態の統合係り受け解析手段２４の一文に対する処理結果の一例を示す図である。図１２では、一文に含まれる文節列（「私は」、「昨日」、「東京で」、「買った」、「本を」、「読んだ」）に対する処理結果を示す。 FIG. 12 is a diagram illustrating an example of a processing result for one sentence of the integrated dependency analysis unit 24 according to the second embodiment of this invention. FIG. 12 shows a processing result for a phrase string (“I am”, “Yesterday”, “In Tokyo”, “Bought”, “Book”, “Read”) included in one sentence.

図１２に示す例では、先行詞生成手段４０４は、図９を用いて説明した処理を実行することによって、文節（「私は」、「本を」）が先行詞になりやすいと判定し、先行詞（「私が」、「本を」）を生成する。生成された先行詞（「私が」、「本を」）は、文節列に含まれる他の各文節（「私は」、「昨日」、「東京で」、「買った」、「本を」、「読んだ」）と共に、文節ペア選択手段４０１に送信される。 In the example shown in FIG. 12, the antecedent generating unit 404 determines that the clause (“I am”, “book”) is likely to become an antecedent by executing the processing described with reference to FIG. 9. An antecedent ("I am", "Book") is generated. The generated antecedents ("I am", "Book") are the other phrases ("I am", "Yesterday", "In Tokyo", "Bought", "Books" And “read”) and sent to the phrase pair selection means 401.

文節ペア選択手段４０１及び文節係り受けスコア計算手段４０２は、前述の第１の実施形態（図３参照）と同様に機能する。一方、係り受け関係選択手段４０３は、前述の第１の実施形態と異なり、各々の先行詞について一つの係り受け関係（「私が」については「私が」と「買った」、「本を」については、「本を」と「買った」）を選択する。例えば、最も高い係り受けスコアを持つ係り受け関係を選択する。 The phrase pair selection unit 401 and the phrase dependency score calculation unit 402 function in the same manner as in the first embodiment (see FIG. 3). On the other hand, the dependency relationship selection means 403 differs from the first embodiment described above in that one dependency relationship for each antecedent (for “I am”, “I bought”, “I bought” ”For“ book ”and“ bought ”). For example, the dependency relationship having the highest dependency score is selected.

図１３は、図１２の例において同定される係り受け関係を示す図である。図１３では、係り元の文節（縦）と係り先の文節（横）との係り受け関係を、行列形式で可視化している。図１３において、下線付きの文字で示される「私が」及び「本を」は、先行詞生成手段４０４によって生成された先行詞である。 FIG. 13 is a diagram illustrating the dependency relationship identified in the example of FIG. In FIG. 13, the dependency relationship between the source clause (vertical) and the destination clause (horizontal) is visualized in a matrix format. In FIG. 13, “I am” and “Book” indicated by underlined characters are antecedents generated by the antecedent generating unit 404.

統合係り受け解析手段２４は、図１３に示すように、係り元の文節（縦）の各々について、高々１つの係り先の文節（横）との係り受け関係（図中○、◎の部分）を同定している。 As shown in FIG. 13, the integrated dependency analysis means 24 has a dependency relationship with at most one dependency clause (horizontal) for each of the dependency clauses (vertical) (the portions marked with ○ and ◎ in the figure). Has been identified.

以上説明してきた本発明の第２の実施形態によれば、述語項構造解析において、大規模かつ高精度な格フレーム辞書を利用することなく、述語項構造における情報の欠落を効率的に補完しながら、述語項構造を同定している。そのため、モデルの複雑さを回避し、計算効率を高めることができる。また、高精度な情報検索や情報抽出を実現することが可能となる。 According to the second embodiment of the present invention described above, in the predicate term structure analysis, the lack of information in the predicate term structure is efficiently complemented without using a large-scale and highly accurate case frame dictionary. However, the predicate term structure is identified. Therefore, the complexity of the model can be avoided and the calculation efficiency can be increased. In addition, highly accurate information retrieval and information extraction can be realized.

以上、本発明の各実施形態について説明したが、上記各実施形態は本発明の適用例の一つを示したものであり、本発明の技術的範囲を上記各実施形態の具体的構成に限定する趣旨ではない。本発明の要旨を逸脱しない範囲において種々変更可能である。 Each embodiment of the present invention has been described above. However, each of the above embodiments shows one application example of the present invention, and the technical scope of the present invention is limited to the specific configuration of each of the above embodiments. It is not the purpose. Various modifications can be made without departing from the scope of the present invention.

２１解析要求入力手段
２２形態素解析手段
２３文節解析手段
２４統合係り受け解析手段
２５解析結果表示手段
４０１文節ペア選択手段
４０２係り受けスコア計算手段
４０３係り受け関係選択手段
４０４先行詞生成手段
４０５係り受け統計モデル 21 analysis request input means 22 morpheme analysis means 23 clause analysis means 24 integrated dependency analysis means 25 analysis result display means 401 phrase pair selection means 402 dependency score calculation means 403 dependency relation selection means 404 antecedent generation means 405 dependency statistics model

Claims

A sentence structure analyzing apparatus that includes a processor that executes a program and a memory that stores a program executed by the processor, and analyzes a structure of an input sentence,
A morpheme analyzing means for dividing the inputted sentence into words in units of morphemes;
Based on each word divided by the morphological analysis means, a phrase analysis means for generating a phrase string composed of a plurality of phrases,
Dependency analysis means for analyzing the dependency relationship between each clause generated by the phrase analysis means,
The dependency analysis unit selects any pair of different clauses from the clause sequence generated by the clause analysis unit,
Calculating a dependency score for each selected phrase pair and storing it in the memory;
Analyzes that there is a dependency relationship with a clause pair having a dependency score greater than or equal to a predetermined threshold ,
The dependency analysis means includes:
An antecedent generating means for generating an antecedent of a zero pronoun from the phrase string generated by the phrase analyzing means;
Selecting a pair of antecedents of zero pronouns generated by the antecedent generator and each clause included in the phrase string generated by the clause analyzer;
Calculate a dependency score for each selected phrase pair,
A sentence structure analyzing apparatus that analyzes that a phrase pair having the highest dependency score has a dependency relation .

The dependency analysis unit selects a pair of clauses including a first clause and a second clause located after the first clause from the clause sequence generated by the clause analysis unit,
Calculating a dependency score when the selected first clause is a source and the second clause is a destination;
2. The analysis according to claim 1, wherein when the calculated dependency score is equal to or greater than a predetermined threshold, it is analyzed that there is a dependency relationship in which the first clause is a dependency source and the second clause is a dependency destination. Sentence structure analyzer.

The dependency analysis unit selects a pair of clauses including a first clause and a second clause located before the first clause from the clause sequence generated by the clause analysis unit,
If there is a dependency relationship with the selected first clause as a destination and the second clause as a source, a dependency score is calculated for the clause pair;
2. The analysis according to claim 1, wherein when the calculated dependency score is equal to or greater than a predetermined threshold, it is analyzed that there is a dependency relationship in which the first clause is a dependency source and the second clause is a dependency destination. Sentence structure analyzer.

The dependency analysis unit generates a dependency-related statistical model based on the phrase string generated by the clause analysis unit, and based on the generated statistical model, the dependency of the selected phrase pair The sentence structure analyzing apparatus according to claim 1, wherein a score is calculated.

The antecedent generating means generates a statistical model for antecedent generation based on the phrase string generated by the phrase analyzing means, and based on the generated statistical model, The sentence structure analyzing apparatus according to claim 1, wherein the sentence structure analyzing device according to claim 1, wherein the sentence structure analysis apparatus calculates a degree of ease of being formed, and uses a phrase having the degree of ease of becoming the highest antecedent as an antecedent.

In a sentence structure analyzing apparatus comprising a processor for executing a program and a memory for storing a program executed by the processor, a sentence structure analyzing method for analyzing a structure of an inputted sentence,
  The processor is
  A procedure for dividing the input sentence into words in units of morphemes;
  A procedure for generating a phrase string composed of a plurality of phrases based on each word divided by the dividing procedure;
  Analyzing a dependency relationship between each clause generated by the procedure for generating the phrase string,
  In the procedure of analyzing the dependency relationship, the processor selects any pair of different clauses from the clause sequence generated by the procedure of generating the clause sequence,
  Calculating a dependency score for each selected phrase pair and storing it in the memory;
  Analyzes that there is a dependency relationship with a clause pair having a dependency score greater than or equal to a predetermined threshold,
  The procedure of analyzing the dependency relationship includes a procedure of generating an antecedent of a zero pronoun from the phrase sequence generated by the procedure of generating the phrase sequence,
  In the procedure of analyzing the dependency relationship, the processor includes the antecedent of the zero pronoun generated by the procedure of generating the antecedent of the zero pronoun and the phrase sequence generated by the procedure of generating the phrase sequence Select a pair with each phrase
  Calculate a dependency score for each selected phrase pair,
  A sentence structure analysis method, wherein a sentence pair having the highest dependency score is analyzed as having a dependency relation.

A sentence structure analysis program that is used in a sentence structure analysis apparatus including a processor that executes a program and a memory that stores a program executed by the processor, and analyzes a structure of an input sentence,
  A procedure for dividing the input sentence into words in units of morphemes;
  A procedure for generating a phrase string composed of a plurality of phrases based on each word divided by the dividing procedure;
  Selecting any different pair of clauses from the clause sequence generated by the step of generating the clause sequence;
  Calculating a dependency score for each of the phrase pairs selected by the selecting procedure and storing it in the memory;
  A procedure for analyzing that a phrase pair having a dependency score equal to or greater than a predetermined threshold has a dependency relationship;
  Generating an antecedent of a zero pronoun from the phrase sequence generated by the procedure of generating the phrase sequence;
  Select a pair of a zero pronoun antecedent generated by the procedure for generating the antecedent of the zero pronoun and each clause included in the phrase sequence generated by the procedure of generating the phrase sequence, and the selected clause Calculating a dependency score for each of the pairs, and analyzing that the phrase pair having the highest dependency score has a dependency relationship;
  Is executed by the processor.