JP3682915B2

JP3682915B2 - Natural sentence matching device, natural sentence matching method, and natural sentence matching program

Info

Publication number: JP3682915B2
Application number: JP2001095891A
Authority: JP
Inventors: 靖博平山
Original assignee: 株式会社ジャストシステム
Priority date: 2001-03-29
Filing date: 2001-03-29
Publication date: 2005-08-17
Anticipated expiration: 2021-03-29
Also published as: JP2002297592A

Description

【０００１】
【発明の属する技術分野】
本発明は、自然文マッチング装置、自然文マッチング方法、及び自然マッチングプログラムに関し、例えば、音声入力などやキーボード入力などされた自然文の意味を解釈するものに関する。
【０００２】
【従来の技術】
従来、コンピュータにコマンドを入力する際には、コマンドを決められた文法に従って入力する必要があった。
ところが、近年の急激なソフトウェア技術とハードウェア技術の発展により、人間の自然な言語（自然言語）を解釈してコマンドを特定することが可能となってきた。
【０００３】
この技術によって、例えばユーザがテレビに「消してくれ」とか、「５チャンネルが見たい」などと発話要求すると、テレビに内蔵されたマイクがユーザの発話要求をピックアップする。
そして、ユーザの入力した自然文はテレビに内蔵された事例文と自然マッチング装置によりマッチングされ、マッチングされた事例文から「テレビを消す」とか「チャンネルを５に合わせる」などのコマンドを特定する。
【０００４】
自然文マッチング装置は、事例文とユーザの入力文を比較し、最も近い事例文から回答を抽出し、調整を加えることで、入力文に対する解析結果を得ることができるものである。このようなものとして、例えば、特開平１１−３１１４９号広報のＩＩＦ（ＩｎｔｅｌｉｇｅｎｔＩｎｔｅｒｆａｃｅ）方式によるものや、キーワードによりマッチングするものなどがある。
【０００５】
【発明が解決しようとする課題】
しかし、ＩＩＦ方式では、述語動詞が不明確な場合は正しく解析できないことがある。
例えば、「文書を検索したい」と入力した場合はマッチングできるが、「検索したいのは文書」というように言い回しをが異なると（この場合倒置）、意味的には同じであるが正しくマッチングできない場合がある。
また、「平山さんの文書を検索したい」という入力文はマッチングできるが、「平山文書検索」というように、入力文が必要な助詞や活用形を含んでいない場合も正しくマッチングできない場合がある。
【０００６】
更に、ユーザの自然言語による音声入力で文を入力することを想定すると、助詞の欠落や語句の誤解析・誤認識の可能性がある。更に、話し言葉では、活用形が標準と異なる場合や不要語が含まれる場合などもある。
これらのことから、入力される文は正しい助詞や活用となっていないものがあり、これにより、入力文が誤解析、誤認識される場合がある。
このように、従来の自然文マッチング装置では、入力文が倒置表現になったもの、助詞の欠落したもの、標準でない活用形を含むものへの対応や、また、誤解析、誤認識された語句への対応が十分ではなかった。
【０００７】
また、キーワードマッチングを利用した場合は、語句の順序や係り受けが考慮されないという問題があった。
例えば、事例文が「送った文書が見たい」であった場合、キーワードは「送る」、「文書」、「見る」である。一方、入力文が「文書を送る」であった場合、キーワードが「文書」、「送る」であり、「送った文書が見たい」という事例文もヒットしてしまう。このため、キーワードマッチングではマッチングの精度はあまり高くない。なお、事例文とはユーザが入力することを想定した自然文である。
【０００８】
そこで、本発明の目的は、意味的な情報や係り受けの情報を用いて事例文と入力された自然文（入力文）とを柔軟にマッチングすることができ、更に、事例文の数が多くなった場合でも効率よくマッチングできる自然文マッチング装置、自然文マッチング方法、及び自然文マッチングプログラムを提供することである。
【０００９】
【課題を解決するための手段】
本発明は、前記目的を達成するために、請求項１に記載の発明では、入力された自然文を取得する自然文取得手段と、前記自然文取得手段にて取得した自然文を文節に区分して文節リストを生成する自然文文節区分手段と、前記自然文文節区分手段で生成した文節リストと、当該文節の係り受け情報を予め記憶してある格フレームと、の一致度により前記文節をランク付けするランク付け手段と、前記自然文文節区分手段にて区分された文節の係り受け情報を表層格、及び深層格にて取得して前記自然文の文構造を取得する自然文文構造取得手段と、回答に対応付けられた事例文に関して、文節の係り受け情報を表層格、及び深層格にて取得することにより文構造を取得する事例文文構造取得手段と、前記係り受け情報が一致した文節の文節数を用いて、前記自然文文構造取得手段にて取得した文構造と前記事例文文構造取得手段にて取得した文構造の一致度を取得する一致度取得手段と、前記一致度取得手段にて取得した一致度を用いて前記回答を特定する回答特定手段と、を具備した自然文マッチング装置であって、前記一致度取得手段は、一致する文節の文節数を数える際に、前記ランク付け手段によって文節リストと格フレームの一致度の低いランクにランク付けされた文節ほど文節数の値を低く調節することを特徴とする自然文マッチング装置を提供する。
請求項２に記載の発明では、前記ランク付け手段が、前記自然文文節区分手段で取得した文節リストと、当該文節に関する格フレームが、表層格と表記の両方で一致するか比較し、一致しない場合は、更に、表層格、又は表記の何れかが一致するかを比較し、表層格、又は表記の何れか一方が一致する場合は、表層格と表記の両方が一致した場合よりも文節のランクを小さくすることを特徴とする請求項１に記載の自然文マッチング装置を提供する。
請求項３に記載の発明では、前記一致度取得手段が、前記文節数を数える際に、前記自然文の係り受けを構成する前後の文節と前記事例文の係り受けを構成する前後の文節との一致を比較し、係り受けの前後のどちらかしか一致していない場合は、係り受けの前後の何れもが一致している場合よりも文節数を低く、係り受けの前後の両方とも一致しない場合は、係り受けの前後のどちらかしか一致していない場合よりも文節数を更に低く調節することを特徴とする請求項１、又は請求項２に記載の自然文マッチング装置を提供する。
請求項４に記載の発明では、前記一致度取得手段が、前記自然文文構造取得手段にて取得した文構造と、前記事例文文構造取得手段にて取得した事例文の文構造とを、表層格又は深層格の少なくとも一方を用いてマッチングして前記一致度を取得することを特徴とする請求項１、請求項２又は請求項３に記載の自然文マッチング装置を提供する。
請求項５に記載の発明では、前記自然文文節区分手段にて区分された文節に含まれる語句に語彙情報を付与する語彙情報付与手段を更に備え、前記一致度取得手段は、前記語彙情報付与手段にて当該語句に付与された語彙情報を用いて前記自然文の文構造と前記事例文の文構造をマッチングして前記一致度を取得することを特徴とする請求項１から請求項４までのうちの何れか１の請求項に記載の自然文マッチング装置を提供する。
請求項６に記載の発明では、前記語彙情報付与手段が、前記文節に含まれる語句に、当該語句に対応する同義語、類義語、多義語、同音異義語、概念情報のうち、少なくとも１つを関連付けることを特徴とする請求項５に記載の自然文マッチング装置を提供する。
請求項７に記載の発明では、前記事例文文構造取得手段が、前記事例文を取得する事例文取得手段と、前記事例文取得手段にて取得した事例文を文節に区分する事例文文節区分手段と、を具備したことを特徴とする請求項１から請求項６までのうちの何れかの１の請求項に記載の自然文マッチング装置を提供する。
請求項８に記載の発明では、前記一致度取得手段が、前記事例文文構造取得手段にて取得した事例文構造を用いて、事例文に含まれる語句を、当該語句の表層格又は深層格の少なくとも一方を用いて、前記自然文の文構造と前記事例文の文構造をマッチングするためのテーブル作成するテーブル作成手段と、を更に具備したことを特徴とする請求項１から請求項７までのうちの何れかの１の請求項に記載の自然文マッチング装置を提供する。
請求項９に記載の発明では、自然文取得手段と、形態素解析手段と、自然文文節区分手段と、ランク付け手段と、自然文文構造取得手段と、事例文文構造取得手段と、一致度解析手段と、回答特定手段と、を備えたコンピュータにおいて、
前記自然文取得手段によって、入力装置より入力された自然文をメモリに格納する自然文格納ステップと、前記形態素解析手段によって、前記自然文格納ステップで格納した前記自然文を形態素解析して形態素列を生成し、前記生成した形態素列をメモリに格納する形態素解析ステップと、前記自然文文節区分手段によって、前記形態素解析ステップで格納した前記形態素列の自立語と付属語を合わせて文節リストを生成し、前記生成した文節リストをメモリに格納する文節解析ステップと、前記ランク付け手段によって、前記文節解析ステップで格納した前記文節リストと、記憶装置に記憶してある格フレームと、の一致度を解析して前記文節をランク付けし、前記ランクをメモリに格納するランク付けステップと、前記自然文文構造取得手段によって、前記文節解析ステップで格納した前記文節リストの文節の係り受け情報を、表層格、及び深層格にて特定することにより前記自然文の文構造を解析し、前記解析の結果得られた文構造をメモリに格納する自然文文構造解析ステップと、前記事例文文構造取得手段によって、回答に対応付けられた事例文を文節に区分し、前記区分した文節の係り受け情報を表層格、及び深層格にて解析し、解析の結果得られた文構造をメモリに格納する事例文文構造解析ステップと、前記一致度解析手段によって、前記自然文文構造解析ステップでメモリに格納した文構造と、前記事例文文構造解析ステップでメモリに格納した文構造の一致度を、係り受け情報が一致した文節の文節数を用いて解析し、解析した一致度をメモリに格納する一致度解析ステップと、前記回答特定手段によって、前記一致度解析ステップでメモリに格納した一致度を用いて前記回答を特定する回答特定ステップと、を行う自然文マッチング方法であって、前記一致度解析ステップで一致する文節の文節数を数える際に、前記ランク付けステップでメモリに格納したランクによって文節リストと格フレームの一致度の低いランクにランク付けされた文節ほど文節数の値を低く調節することを特徴とする自然文マッチング方法を提供する。
請求項１０に記載の発明では、コンピュータのメモリにロードされてＣＰＵで実行されることにより、入力装置より入力された自然文をメモリに格納する自然文格納機能と、前記自然文格納機能によって格納した、前記自然文を形態素解析して形態素列を生成し、前記生成した形態素列をメモリに格納する形態素解析機能と、前記形態素解析機能によって格納した前記形態素列の自立語と付属語を合わせて文節リストを生成し、前記生成した文節リストをメモリに格納する文節解析機能と、前記文節解析機能によって格納した前記文節リストと、記憶装置に記憶してある格フレームと、の一致度を解析して前記文節をランク付けし、前記ランクをメモリに格納するランク付け機能と、前記文節解析機能によって格納した前記文節リストの文節の係り受け情報を、表層格、及び深層格にて特定することにより前記自然文の文構造を解析し、前記解析の結果得られた文構造をメモリに格納する自然文文構造解析機能と、回答に対応付けられた事例文を文節に区分し、前記区分した文節の係り受け情報を表層格、及び深層格にて解析し、解析の結果得られた文構造をメモリに格納する事例文文構造解析機能と、前記自然文文構造解析機能でメモリに格納した文構造と、前記事例文文構造解析機能でメモリに格納した文構造の一致度を、係り受け情報が一致した文節の文節数を用いて解析し、解析した一致度をメモリに格納する一致度解析機能と、前記一致度解析機能でメモリに格納した一致度を用いて前記回答を特定する回答特定機能と、をコンピュータで実現する自然文マッチングプログラムであって、前記一致度解析機能で一致する文節の文節数を数える際に、前記メモリに格納したランクによって文節リストと格フレームの一致度の低いランクにランク付けされた文節ほど文節数の値を低く調節することを特徴とする自然文マッチングプログラムを提供する。
【００１０】
以上の構成により、意味的な情報や係り受けの情報を用いることにより、意味的に同じである文は似ていると判断でき、音声入力の場合の誤認識、や誤解析（同音異義語などにより生じる）や、語句が省略されている場合でも、柔軟に事例文と自然文のマッチングを行うことができる。
【００１１】
【発明の実施の形態】
本実施の形態は、表層と深層を考慮して自然文と事例文をマッチングすることによりマッチングの適正化を行うものである。また、マッチングの際に多数の事例と照合するためのソフトウェアの効率化を図っている。
【００１２】
自然文の深層（自然文の意味構造）をも考慮することにより、より柔軟に自然文と事例文をマッチングすることができる。
例えば「田中さんにメールを送って」と、これを倒置した「メールを送って、田中さんに」は、共に「田中さんにメールを送る」という事例文に対応するものである。
従来は、倒置の形で入力された自然文に事例文を対応させるためには、例えば倒置の形の事例文を用意するなど、事例文に倒置の場合にも対応できるようにしておく必要があった。
しかし、深層をも考慮すると「メールを送って、田中さんに」を「田中さんにメールを送る」にマッチングすることができ、事例文の辞書を効率化することができる。
また、自然文が不完全である場合なども深層の構造を用いることに適正にマッチングすることが可能である。
【００１３】
以下に、本発明の実施の形態を図１から図１９までを参照しながら詳細に説明する。
図１は、本実施の形態に係る自然文マッチング装置１の構成の一例を示した図である。
自然文マッチング装置１は、中央処理装置２、入出力部３、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）４、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）５、記憶部７などが、バスライン６によって接続されて構成されている。
【００１４】
入出力部３は、入出力装置に接続しており、自然文マッチング装置１の外部と情報の入出力を行う部分である。
自然文マッチング装置１が、例えばテレビに装備されている場合は、入出力部３は、ユーザの発話要求（自然文で構成されたユーザからテレビへの指示）を取得するマイクロホンや、マッチングの結果得られた得られた情報をテレビの制御部へ送信するインターフェースなどである。
また、自然文マッチング装置１がパーソナルコンピュータに装備された場合、例えば、ユーザの自然文による要求を取得するためのマイクロホンやキーボードなどや、ユーザに応答文を提示するディスプレイなどである。
【００１５】
ＲＯＭ５は、読み出し専用のメモリであり、自然文マッチング装置１を動作させるための基本的なプログラムなどが記憶されている。
ＲＡＭ４は、読み書き可能なメモリであって、中央処理装置２のワーキングメモリを提供したり、記憶部７に記憶されたプログラムやデータをロードして記憶したりなどする。
【００１６】
記憶部７は、例えばハードディスクやその他の不揮発性のメモリなどによって構成されている。記憶部７は、自然文マッチングプログラムやその他のプログラムを記憶したプログラム部８と、自然文をマッチングする際に使用する語彙辞書、格フレーム辞書、属性辞書やその他のデータを記憶したデータベース９などによって構成されている。
また、自然文マッチング装置１が、例えば音声ワープロに使用される場合は、プログラム部８にワープロソフトを記憶することもできる。
【００１７】
中央処理装置２は、ＲＯＭ５やＲＡＭ４に格納されたプログラムに従って、自然文マッチング処理を行ったり、入出力部３でのデータの入出力を制御したりなどする。
バスライン６は、中央処理装置２と入出力３などのその他の部分との間でデータの送受信をする際の伝送媒体である。
【００１８】
図２は、自然文マッチングシステム１５の構成を示した図である。
自然文マッチングシステム１５は、図１のプログラム部８に格納された自然文マッチングプログラムがＲＡＭ４にロードされ、ソフトウェア的に実現されたものである。
また、自然文マッチングシステム１５の各構成要素をハードウェアによって構成することもできる。
【００１９】
以下に、自然文マッチングシステム１５の各コンポーネントの処理内容について概要を説明する。各コンポーネントの具体的な処理例は後ほど説明する。
まず、ユーザによってマイクロホンやキーボードなどから入力された入力文は、形態素解析コンポーネント１７に入力される。この入力文は、人間の自然な言語である自然文で構成されている。
【００２０】
形態素解析コンポーネント１７は、入力された自然文の形態素解析を行い、その結果を形態素リストとして文節解析コンポーネント１８に出力する。形態素とは、文節より更に細かく、語句を自立語と付属語のレベルまで区分したものである。文節解析コンポーネント１８は、形態素リストから文節リストを作成する。文節の作成は、基本的に形態素リストにある自立語と付属語をあわせて文節とする。
後に、人名や地名などの概念を処理する際に必要であるため、形態素解析の結果から得られる具体的な数値、人名、地名などの情報も文節リストに付与する。
また、アルファベットやカタカナ、記号などの正規化処理も行う。なお、正規化処理とは、文字コードの全角、半角やアルファベットの大文字や小文字、漢字の異体字をある一定のものに揃える処理のことである。例えば、アルファベットを半角小文字に、半角文字のカタカナを全角文字に、異体字を常用漢字に揃える処理を考えると、全角文字の「Ａｌｐｈａｂｅｔ」は半角文字の「Ａｌｐｈａｂｅｔ」に、半角の「カタカナ」は全角文字の「カタカナ」に、「渡邊」は「渡辺」に変換することができ。
【００２１】
語彙処理コンポーネント１９は、文節解析コンポーネント１８から文節リストを取得し、語彙辞書２５を用いて該文節リストに意味的な情報を付与していく。意味的な情報としては、例えば、同義語、類義語、多義語、同音異義語、概念情報などがある。これらの情報は、語彙辞書２５にテーブル化されて記憶されている
概念情報には、赤や青などの概念である色や西や東などの概念である方向などのほか、地名や人名などの特殊概念が存在する。後に説明するように、本実施の形態では、特殊概念を用いて、形態素解析時に数値、人名、地名なども概念処理できるようにした。また、後に述べるように、例えば、９時２０分などの時間に関する表現も概念に含めることができる。
【００２２】
格フレーム処理コンポーネント２０は、意味を付与された文節リストを語彙処理コンポーネント１９から取得し、動詞に対する目的語と思われる語句を表層格と概念から決定する。なお、そのときに、文節リストに深層格の情報を付与することができる。
例えば、入力文が「細川にメールを送る」である場合、動詞は「送る」であり、この動詞に対する目的語は表層格で「を格」であり、深層格で「対象格」である「メール」である。
通常動詞の目的語は、「〜を」の形で表記され、これを表層格では「を格」と呼ばれる。また、動詞の目的語は、意味的には、その動詞の動作の対象となるので、深層格では「対象格」と呼ばれる。
【００２３】
また、格フレーム辞書２６には、様々な語句に対応する格フレームが記憶してある。格フレームとは、例えば、「送る」という語句は、表層格では「〜に〜を送る」又、深層格では「（相手）に（対象格）を送る」というフレーム（構造）を持ち、「に格」、「相手格」には、人名という概念が対応し、「を格」、「対象格」には、メール、手紙などが対応するといったことがテーブルとなって記憶されたものである。詳細は、後ほど述べる。
格フレーム処理コンポーネント２０は、文節リストの目的語と思われる語句を決定した後、格フレーム辞書２６を参照して、どの程度、入力文が格フレームにマッチしているかを判断する。
【００２４】
格フレーム情報辞書２６の格フレームの情報から入力文がどの格フレームにマッチしているのかを決定する場合に情報が足りない場合や、語彙情報が足りない場合がある。このような場合は、例えば、深層格の情報のみでマッチングするなどマッチングの条件を緩めて処理を行う。
このように、マッチングの条件を緩めることにより、本来マッチングが困難な場合もそれなりにマッチングを行うことができる。
【００２５】
入力文を格フレームとマッチングする際に、表層格及び概念（又は表記）が一致する場合は、ランク１とし、ランク１のものが無ければ概念（又は表記）のみが一致するもの、又は表層格のみが一致するものを探し、これをランク２とする。
ランク１及びランク２のものが無ければ、一般的な係り受けの情報を採用し、これをランク３とする。一般的な係り受けの情報とは、「を格」は動詞に係る、「に格」は、動詞、又はサ変名詞に係るといった情報である。
格フレーム処理を行った結果、格フレームの深層格の情報やどのランクで一致したかという情報を持った文構造が格フレーム処理コンポーネント２０により生成される。
【００２６】
属性付与コンポーネント２１は、文構造（文節）の情報に自然文マッチングシステム１５を使用しているアプリケーションソフトや装置などに依存した情報、例えば、コマンドのパラメータの情報などを付与する。これらの情報は属性辞書２７に記憶されている。
特殊概念を属性とした場合の値は、特殊概念の値をそのまま属性値とすることができる。例えば、概念で処理した人名、地名、数値、時間などは、入力された値をそのまま属性にすることができる。
【００２７】
マッチング処理コンポーネント２２が取得する文構造は、語彙情報、格フレーム情報、格フレームとマッチングした際のランク、属性情報などを含んでいる。
マッチング処理コンポーネント２２は、入力文と事例文の一致度を求め、一致度高い事例文の回答を回答として採用する処理を行う。入力文と事例文の一致度は、２つの文構造がどれくらい一致しているかを求めるもので、文節の情報と係り受けの情報から計算される。
【００２８】
更に、具体的には、属性付与コンポーネント２１から属性を付与された文構造を受け取り、訓練コーパス（ユーザが入力すると想定される自然言語とこれに対応する回答を組にしたもの）の事例文から作成された回答辞書２８を用いて事例文と入力文をマッチングする。そして、一致度が高い事例文を特定し、これから得られる回答を出力情報２３として出力する。
回答とは、事例文などから特定されるコマンドや、属性付与コンポーネント２１によって付与されたパラメータなどが、セットになったものである。即ち、事例文がマッチングによって特定されると、コマンドやパラメータが特定されるのである。
【００２９】
自然文マッチングシステム１５の各種辞書のうち、語彙辞書２５、格フレーム辞書２６は、一般的な言語的な知識を辞書化したものであり、出力情報２３の出力対象であるアプリケーションソフトや装置が異なっていても共通して使用することができる。
一方、属性辞書２７と回答辞書２８は、アプリケーションソフトや装置に依存したものとなるため、アプリケーションソフト又は装置ごとに１つずつ作成することとなる。
このように、自然文マッチングシステム１５では、辞書類を一般的な言語的な部分とアプリケーションソフト又は装置に依存した部分に分割することにより、新たなアプリケーションソフトや装置に対応する場合ではも追加する辞書サイズを小さくすることできる。
【００３０】
図３は、語彙辞書２５の構成を示した図である。
語彙辞書２５は、同義語を登録した同義語部３１、類義語を登録した類義語部３２、多義語を登録した多義語部３３、同音異義語を登録した同音異義語部３４及び例えば時間や人名といった概念を登録した概念部３５などから構成されている。これら各部の内容については、以下の各処理とともに説明する。
【００３１】
次に語彙処理コンポーネント１９で行われる解析の具体例について説明する。
［同義語の解析例］
図４は、同義語部３１に作成された同義語テーブル４１を示している。
同義語部３１は、同じ意味を表す語句（同義語）を集めたものであり、それぞれの同義語がそれらを代表する代表語に対応付けられてテーブル化されている。
例えば、「購入する」は「買う」と同義語であり、「使う」は、「使用する」と同義語である。
今、例えば入力文が「パソコンを使う」であったとする。まず、形態素解析コンポーネント１７により、
形態素解析→パソコン／を／使／う
と解析され、次いで文節解析コンポーネント１８により
文節解析→パソコンを／使う
と、文節に区切られる。次いで語彙処理コンポーネント１９により、
パソコンを／使う（同義語＝使用する）
というように文節リストに同義語の代表語（ここでは、使用する）が付与される。
【００３２】
［類義語の解析例］
図５は、「探す」、「使用する」を代表語とした場合の類義語テーブル４３を示したものである。
類義語部３２には、意味的には類似しているが完全には同じでない表現を処理するための類義語情報がテーブル化して記憶されている。
例えば、「探す」、「検索する」、「捜索する」、「探る」は互いに類義語であり、「使用する」、「利用する」、「活用する」、「用いる」は互いに類義語である。
類義語部３２では、類義語の中から代表語を１つ選び（例えば、最も一般的に使用される語句を選択するなどする）、それが類義語の情報としてテーブル化されて登録されている。
【００３３】
例えば、入力文が「パソコンを用いる」であった場合、類義語は同義語の場合と同様にして以下のように解析される。
形態素解析→パソコン／を／用／いる
文節解析→パソコンを／用いる
語彙処理→パソコンを／用いる（類義語＝使用する）
このように、語彙処理により文節リストに類義語の情報が付与される。
【００３４】
［多義語の解析例］
図６の多義語テーブル４３の一例を示した図である。
多義語は、複数の意味を持つ語句である。例えば、「引く」という動詞には、「引き算する」や「引き寄せる」といった複数の意味を含んでいる。
多義語部３３では、これらの多義語がテーブル化されて登録されている。
例えば、入力文が「線を引く」であった場合、多義語は以下のように解析される。
形態素解析→線／を／引／く
文節解析→線を／引く／
語彙処理→線を／引く（多義語＝引き算する、引き寄せる）
このように、文節リストに多義語の情報が付与される。
【００３５】
［同音異義語の解析例］
図７は、同音異義語テーブル４７の一例を示した図である。
同音異義語に対する処理は、音声認識で異なる意味と解釈されてしまう可能性のある語句に対して同音異義語の情報を付与することにより行われる。
同音異義語としては、例えば「対照」、「対象」、「対称」や「掛ける」、「欠ける」、「描ける」、「書ける」などがある。
同音異義語部３４では、これらの同音異義語がテーブル化されて登録されている。
【００３６】
例えば、入力文が「図形を対象にする」であった場合、同音異義語は以下のように解析される。
形態素解析→図形／を／対象／に／する
文節解析→図形を／対象に／する
語彙処理→図形を／対象に（同音異義語＝対照、対称）／する
このように文節リストに同音異義語の情報が付与される。同音異義語の情報を付与することにより、例えば、音声入力などの場合に、誤解析の可能性を少なくすることができる。
【００３７】
［概念情報の解析例］
図８は、概念テーブル４９の一例を示した図である。
概念部３５では、例えば色や方向といった意味内容が同じものをまとめて扱えるように、概念テーブル４９に示したようにテーブル化されて登録されている。例えば、「上」、「下」、「右」、「左」の概念情報は「方向」であり、「赤」、「緑」、「青」の概念情報は「色」である。
特殊概念には、人名、地名、数値、時間などがある。これらの特殊概念のうち、人名、地名、数値は形態素解析時に付与された情報を元に概念情報を作成し、時間は、図示しない時間概念辞書に登録された時間テーブル５１からの時間の値を作成する。図９は、時間テーブル５１を示した図である。
図９に示したように、特殊概念は値を持つこともできる。また、複数文節にまたがるものはコンマで区切って表す。
【００３８】
例えば、入力文が「上に移動する」であった場合、概念情報は以下のように解析される。
形態素解析→上／に／移動／する
文節解析→上に／移動する
語彙処理→上に（概念＝方向）／移動する
【００３９】
また、入力文が「８月３日に更新したファイルを検索する」というように時間を含む場合は、概念は以下のように解析される。
形態素解析→８／月／３／日／に／更新／し／た／ファイル／を／検索／する
文節解析→８／月／３／日に／更新した／ファイルを／検索する
語彙処理→８（概念＝数値・８）／月／３（概念＝数値；３）日に／更新した／ファイルを／検索する
このように数値概念は値を持つことができる。
また、以下のように複数文節（８／月／３／日に）を処理することもできる。
語彙処理→８／月／３／日に（概念＝時間；２０００／８／３）／更新した／ファイルを／検索する
【００４０】
更に、入力文が「平山の文書を検索する」というよに特殊概念である人名を含む場合は、以下のように解析される。
形態素解析→平山／の／文書／を／検索／する
文節解析→平山の／文書を／検索する
語彙処理→平山の（概念＝人名；平山）／文書を／検索する
このように、人名概念や地名概念は値を持つことができる。
【００４１】
［複数文節・多段階処理の場合］
語彙処理コンポーネント１９は、時間概念の解析のときと同様にして複数文節から成り立つものを処理するとともに、同義語部３１、類義語部３２などから同義語や類義語などの情報を付与することができる。
例えば、入力文が「Ａ４用紙に３ページから出す」であった場合、以下のように解析される。
形態素解析→Ａ４／用紙／に／３／ページ／から／出す
文節解析→Ａ４／用紙に／３／ページから／出す
語彙処理→Ａ４／用紙に（同義語＝紙）／３（概念＝数値；３）／ページから／出す（紙＋出す→同義語＝印刷する）
語彙処理コンポーネント１９は、同義語処理結果である「紙」と「出す」を更に多段処理して「印刷する」を付与している。
これは、同義語部３１に図１０（ａ）に示した同義語テーブルにより「用紙」に「紙」という類義語情報が付与され、更に、図１０（ｂ）にしめした同義語テーブルにより「紙に」、「出す」に「印刷する」という同義語情報が付与されたものである。
【００４２】
次に格フレーム処理コンポーネント２０で行われる解析の具体例について説明する。
図１１は、格フレーム辞書２６に格納されている格フレーム情報テーブル５４の一例を示した図である。
例えば、「送る」という語句（述語）は、通常「（相手）に（対象）を送る」と言う形で使用される。（相手）に該当する文節は、表層格では「に格」、深層格では「相手格」と呼ばれ、（対象格）に該当する文節は表層格では「を格」、深層格では「対象格」と呼ばれる。
格フレーム情報テーブル５４では、「送る」の「を格」、「対象格」に該当する表記として「メール」、「手紙」を登録しており、「に格」、「相手格」に該当する概念情報として「人名」が登録されている。
格フレーム処理コンポーネント２０は、語彙処理された文節リストと、格フレーム辞書２６に格納した格フレーム情報をマッチングし、マッチングの程度をランク付けする。
【００４３】
例えば、、入力文が「細川さんに送ったメールを転送する」であった場合、形態素解析から語彙処理までは、以下の手順で解析される。
形態素解析→細川／さん／に／送／った／メールを／転送する
文節解析→細川さんに／送った／メールを／転送する
語彙処理→細川さんに（概念＝人名）／送った／メールを／転送する
【００４４】
格フレーム処理コンポーネント２０は、以上のように語彙処理された文節リストと格フレーム情報を以下のステップでマッチングする。
ステップ１：格フレーム情報テーブル５４を用いて「転送する」の文節から得られる格フレームの情報を取得する。
対象格については、図１１の格フレーム情報テーブル５４の「送る」欄の「対象格」欄を参照すると、「メールを」→「送った」となり、表記（メールを）で一致しているので、この一致はランク１となる。・・・（１）
相手格については、「送る」欄の「相手格」欄を参照すると、「細川さんに（概念＝人名）」→「送った」となり、概念（人名）で一致しているので、この一致はランク１となる。・・・（２）
【００４５】
ステップ２：「メールを」の文節から得られる格フレーム情報を取得する。
格フレーム情報テーブル５４の「語句」欄には、該当する語句がないのでこの文節から得られる格フレーム情報は無い。「語句」欄には、述語となることができる語句が登録されており、メールは名詞であるのでこの欄には無い。
【００４６】
ステップ３：「送った」の文節から得られる格フレーム情報を取得する。
格フレーム情報テーブル５４の「送る」欄の「相手格」欄を参照すると、「細川さんに（概念＝人名）」→「送った」となり、概念で一致するので、この一致はランク１となる。・・・（３）
【００４７】
ステップ４：係り受けの発生していない文節を調べる。
係り受けの発生していない文節は「送った」である。つまり、「送った」より前の位置にあって、「送った」の相手格、対象格となる語はない。
一方、「送った」は動詞の連体形、即ち体言（名詞・代名詞）が連なる形なので、名詞、サ変名詞、又は未登録語に係る。
ここでは、一般的な係り受けを採用し、「送った」→「メールを」とする。「送った」は「メールを」の修飾語であり、ランク３とする。・・・（４）
【００４８】
（１）から（４）まででランクの高いもの、文節の距離が近いものを採用して係り受けの情報（この例では、（１）、（３）（４）を採用）とし、文構造を作成する。図１２に格フレームコンポーネント２０が作成した文構造を示す。
図１２に示したように、ユーザから入力された自然文の意味は、「メール」を「転送する」ことであり、その「メール」は、「細川さんに」「送った」ものであるとなる。
【００４９】
また、入力文が「東に送ったメールを転送する」であった場合は、以下のように解析される。
形態素解析→東／に／送／った／メール／を／転送／する
文節解析→東に／送った／メールを／転送する
語彙処理→東に（概念情報＝方向）／送った／メールを／転送する
格フレーム処理は以下のように行われる。
ステップ１：「転送する」の文節から得られる格フレーム情報を取得する。
格フレーム情報テーブル５４の「転送する」欄の「対象格」欄から「メールを」→「転送する」となり、表記で一致するのでこの一致はランク１である。・・・（１）
【００５０】
ステップ２：「メールを」の文節から格フレームを取得する。格フレームテーブル５４から「メールを」の文節から得られる格フレームは無い。
ステップ３：「送った」の文節から得られる格フレームを取得する。格フレーム情報テーブル５４から「送った」の文節から得られる格フレームも無い。
即ち、「送った」よりも前にある文節で（「送った」に係る文節は、「送った」よりも前にあるはずであるから）、対象格（メール、手紙）と相手格（概念情報＝人名）の何れに該当する文節はない。
【００５１】
ステップ４：係り受けの発生していない語句を処理する。
係り受けの発生していない文節は「東に」である。この場合、マッチングの条件を緩めて、表層格、表記、概念の何れかが一致していなくても良いとする。
「送る」欄の「相手格」欄から「東に（概念情報＝方向）」→「送った」となり、表記は「に格」で一致するが、概念が一致しないので、この一致はランク２である。・・・（２）
また、「転送する」欄の「相手格」欄から「東に（概念情報＝方向）」→「転送する」となり、表記は「に格」で一致するが、概念情報が一致しないので、この一致もランク２である。・・・（３）
【００５２】
更に、係り受けの発生していない文節は「送った」である。「送った」は動詞の連体形なので「名詞、サ変名詞、未登録語」に係る。ここでは、一般的な係り受けを採用し、「送った」→「メールを」とする。「送った」は修飾語であり、この場合、ランク３となる。・・・（４）
格フレームコンポーネント２０は、（１）から（４）まででランクの高いもの、文節の距離が近いものを採用して文構造を作成する。図１３に作成された文構造を示す。即ち、ユーザの入力文の意味は、「メール」を「転送する」ことであり、その「メール」は、「東」に「送ったもの」である。ただし、「東」が人名であるか否かは定かでない。
【００５３】
次に、属性付与コンポーネント２１で行われる処理の具体例について説明する。
格フレーム処理コンポーネント２０によって作成された文構造に自然文マッチング装置１を使用しているアプリケーションソフトや装置のパラメータに依存した値に関する情報を付与する。
人名、地名、数値、時間などの概念情報は、そのまま属性として使用することもできる。また、これらの概念情報を属性とした場合は、語彙処理コンポーネント１９で概念情報の処理をした時に取得した概念情報の値を属性値として利用することもできる。また、属性付与コンポーネント２１での属性情報の処理を行う際に、独自に概念情報の値を取得する方法を採用しても良い。
【００５４】
図１４は、属性辞書２７に格納されている属性テーブル５７の一例を示した図である。属性テーブル５７では、語句と、その語句の概念を表す概念情報、及びその語句に対応したパラメータが組となって格納されている。例えば、「細線」は概念情報としては「線種」であり、細線は「線種」のうちのパラメータ１で表される。即ち、「細線」は「線種（１）」で表される。
同様に、語句「赤」は「色（０ｘ００００ｆｆ）」に対応する。
【００５５】
例えば、入力文が「細線を引く」であったとする。この入力文は、以下の手順で処理される。

このように、属性付与コンポーネント２１は、属性辞書２７を参照して格フレーム情報コンポーネント２０から取得した文構造に属性情報（この場合、線種属性＝１）を付与する。
【００５６】
また、入力文が「細い線を引く」であった場合は、以下のように処理される。
形態素解析→細／い／線／を／引／く
文節解析→細い／線を／引く
語彙処理→細い／線を／引く
【数式１】

このように、複数の文節から属性情報を取得することもできる。
【００５７】
次に、マッチング処理コンポーネント２２で行われる処理の具体例について説明する。
マッチング処理コンポーネント２２は、訓練コーパスにある事例文の文構造と、属性付与コンポーネント２１から取得した文構造を比較して一致度を計算する。
一致度は、２つの文構造がどれくらい一致しているかを求めるもので、一例として、以下の式のように、回答候補の文節数と回答候補と一致した文節数からマッチング指数として求めるものが考えられる。
（マッチング指数）＝（一致した文節数）／（回答候補の分節数）
回答候補の順位を決める際に、下の式のようにして、マッチング指数だけでなく、「一致した文節」の高いものを優先して順位を決めるようにする。
（一致した文節数）＞（マッチング指数）
即ち、一致した文節数が（マッチング指数の分子）一番大きくなる回答をマッチング結果として採用し、一致した文節数が同じ場合にはマッチング指数の大きさで判断する。
【００５８】
なお、マッチング指数の一致した文節数の数え方は、文節がどのくらい一致したかという指標として深層格で一致した場合０．３、表記／概念で一致した場合０．７（類義語で処理したものは０．４）として計算した。
【００５９】
更に、格フレームの一致度でランクが低くなったもの（例えばランク３の場合）に対しては、文節数の値を１／２にするなどの調整を行う。
また、文節の係り受けの情報を反映させるため、係り受けの前後の文節が回答候補のものと一致している場合は、文節数はそのままとし、係り受けの前後のどちらかしか一致していない場合は文節数を更に２／３にする。係り受けの前後の両方とも一致しない場合は文節数を１／３にする。
【００６０】
以下にマッチングの具体例を示す。
図１５に示したような回答１〜５からなる訓練コーパスを想定する。回答２は時間の概念を含み、回答３、５は人名の概念を含んでいる。
なお、検索コマンドにおけるパラメータ１は検索アイテム種類を表しており、「１＝４」の場合はメールである。パラメータ２は送信者名、パラメータ３は時間、パラメータ５は検索方向を表しており、「５＝１」の場合は降順（下方向）への検索である。送信コマンドにおけるパラメータ１は宛先を表している。
図１５に示した訓練コーパス６０内の各事例文のそれぞれ対して形態素解析、文節解析、語彙処理、格フレーム処理、属性付与までの処理を行い、これらの文構造を作成する。
【００６１】
図１６は、訓練コーパス６０の格事例文を解析して取得した文構造を示している。上から、順に回答１〜回答５に対応している。
次に、マッチングを行うために必要な回答検索用の回答辞書２８を作成する。
本実施の形態では、マッチングの際に回答辞書２８を作成することとしたが、これは、あらかじめいろいろな事例文に対して作成し、記憶しておいても良い。
回答検索には、事例文の文構造と回答候補を結びつけた回答辞書２８を用いる。
図１７は、回答辞書２８に作成された回答検索テーブル６３の一例である。これは、図１６に示した文構造から作成されたものである。
回答検索テーブル６３は、事例文の語句を文構造に基づいて分解し、各語句ごと（表記又は属性）ごとにまとめたものである。
【００６２】
「表記／属性」欄は、訓練コーパス６０に現れる表記又は属性が記録されている。
「深層格」欄は、これらの表記又は属性の深層格が記録されている。「表層格」欄には、これらの表記又は属性の表層格が記されている。
「状態」欄の状態１、状態２、・・・は、深層格又は表層格によって分類された表記又は属性を区別するための表記である。「回答候補」欄は、各状態の元となった表記又は属性を含む回答である。「連続文節」は、各回答で、表記又は属性が接続する先の文節を状態で示したものである。
【００６３】
図１８は、各回答候補候補と、それらの回答に対応するコマンドやパラメータ、条件などを示した図である。
例えば、回答１は、対応するコマンドは「検索」であり、パラメータ１の値は４である。また、回答１が選択される条件は、入力文の文構造が表記「検索する」を述語としてふくむ状態、即ち状態１と状態２の場合である。
【００６４】
以上のように、訓練コーパスから作成された回答辞書２８を用いて行うマッチングの具体例を示す。
入力文が「メールを検索する」の場合、形態素解析コンポーネント１７から属性付与コンポーネント２１までの各コンポーネントにより、属性を付与した文構造を次のように作成する。
（述語）検索する
（対象格）メール（を格）
【００６５】
次に、述語の「検索する」に該当する回答候補を回答検索テーブル６３の「検索する」欄から探す。その結果、回答１から回答４までが候補となる。
次に、対象格の「メール」に該当する回答候補を回答検索テーブル６３の「メール」欄から探す。その結果、回答１から回答５までが候補となる。
以上から回答候補として回答１から回答５までがありえることになる。
【００６６】
次に各回答候補に対する一致度を計算する。
［回答１に対する一致度］
マッチング指数は次式のようになる。
（マッチング指数）＝２／２＝１．００
［回答２に対する一致度］
「メールを」の節にかかるものがないため「メールを」の文節は１×２／３となり、その結果、マッチング指数は次式のようになる。
（マッチング指数）＝（１×２／３＋１）／３＝０．５６
【００６７】
［回答３に対する一致度］
「メールを」の節にかかるものがないため「メールを」の文節は１×２／３となり、その結果、マッチング指数は次式のようになる。
（マッチング指数）＝（１×２／３＋１）／４＝０．４２
［回答４に対する一致度］
「検索する」の節にかかるものがないため「検索する」の文節は１×２／３となり、その結果、マッチング指数は次式のようになる。
（マッチング指数）＝（１×２／３＋１）／３＝０．５６
【００６８】
［回答５に対する一致度］
「メールを」の節は係り受けの前後が一致しないため１×１／３となり、その結果、マッチング指数は次式のようになる。
（マッチング指数）＝（１×１／３）／３＝０．１１
以上の結果、回答１の一致度が高いの回答１のコマンドを回答とする。
【００６９】
次に、入力文が「細川さんのメールを検索する」であった場合のマッチングについて説明する。
まず、形態素解析コンポーネント１７から属性付与コンポーネント２１までの各コンポーネントにより、属性情報を付与した文構造まで解析する。
「細川さんのメールを検索する」の場合、次に様に解析される。
（述語）検索する
（対象格）メール（を格）
（？？？）細川さん（の格）（人名＝細川）
【００７０】
「細川さんの」の文節に対する深層格を（？？？）としたのは、表層格が（の格）や概念（人名）では「検索する」に対する格フレームがないため、ランク１、２のものがないため一般的な係り受けを採用したためである。
まお、この場合、以下のような格フレームの情報はないものとしている。
（語句）検索する、（表層格）で格、（深層格）対象格、（概念／表記）＊人名次に、述語の「検索する」を回答検索テーブル６３の「検索する」欄から探す。その結果、回答１から回答４までが候補となる。
次に、対象格の「メールを」を回答検索テーブル６３の「メール」欄から探す。その結果、回答１から回答５までが候補となる。
次に、限定格の「細川さん」又は「＊人名」を回答検索テーブル６３から探す。その結果、回答候補は無い。
次に、深層格や表層格の条件を無視して「細川さん」を探すがこれも無い。
次に、深層格や表層格の条件を無視して「＊人名」を探す。その結果、回答３、５が候補となる。
以上の結果から、回答１から回答５までが回答候補となる。
【００７１】
次に、マッチング処理コンポーネント２２は、各回答候補に対して一致度を計算する。
［回答１に対する一致度］
マッチング指数は次式のようになる。
（マッチング指数）＝（１×２／３＋１）／２＝０．８３
また、マッチング指数の分子は１．６７である。
［回答２に対する一致度］
（マッチング指数）＝（１×２／３＋１）／３＝０．４２
また、マッチング指数の分子は１．６７である。
【００７２】
［回答３に対する一致度］
「細川さんに」の節は、深層格が一致せず格フレームのランクが３であり、また、係り先も一致しないので０．７／２×２／３となり、又、「メールを」の文節は係る文節が一致しないので１×２／３となる。そのため、マッチング指数は次式のようになる。
（マッチング指数）＝（０．７／２×２／３＋１×２／３＋１）／４＝０．４８
また、マッチング指数の分子は１．９０である。
［回答４に対する一致度］
「検索する」の文節に係るものがたりないため「検索する」の文節は１×２／３となる。このため、マッチング指数は次式のようになる。
（マッチング指数）＝（１＋１×２／３）／３＝０．５６
また、マッチング指数の分子は１．６７である。
【００７３】
［回答４に対する一致度］
「細川さんの」の文節は深層格が一致せず格フレームのランクが３であり、また係り先も一致しないので０．７／２×２／３となる。また、「メールの」の文節は前後の係りが一致しないため１×１／３となる。そのため、マッチング指数は次式のようになる。
（マッチング指数）＝（０．７／２×２／３＋１×１／３）／３＝０．１９
また、マッチング指数の分子は０．５７である。
以上の結果、マッチング指数の分子が一番大きいものは回答３であるので回答３を回答とする。即ち、回答３を回答候補の１番目とする。
【００７４】
次に、入力文が「検索したいのはメール」というように倒置形であった場合のマッチングについて説明する。
まず、形態素解析コンポーネント１７から属性付与コンポーネント２１までの各コンポーネントにより、属性情報を付与した文構造まで解析する。
「検索したいのはメール」の場合、次に様に解析される。
（述語）なし
（？？？）メール（φ格）
（？？？）の（は格）
（連体）検索する（希望）
上記の（？？？）は表層格が「の格」なので確定していないことを示す。また、上記の解析結果が示すように、倒置形の場合は述語に相当するものは無い。
【００７５】
次に、マッチング処理コンポーネント２２は、「メール」には、深層格が設定されていないため深層格や表層格の条件を無視して（即ち、一致条件を緩和して）「メール」を回答辞書２８に作成された回答検索テーブル６３の「表記／属性」欄から探す。その結果、回答１から回答５までが候補となる。
次に、「の格」には、深層格が設定されていないため、マッチング処理コンポーネント２２は、深層格や表層格の条件を無視して「の」を回答検索テーブル６３の「表記／属性」欄から探す。その結果、「の」に該当するものは無い。
【００７６】
次に、マッチング処理コンポーネント２２は、回答検索テーブル６３の「表記／属性」欄が「検索する」で、「深層格」欄が連体修飾であるものを探す。その結果、該当するものは無い。
次に、マッチング処理コンポーネント２２は、「検索する」の深層格の条件を無視して回答検索テーブル６３の「表記／属性」欄で該当するものを探す。その結果回答１から回答４までが回答候補となる。
以上の検索結果から回答１から回答５までが回答候補となる。
【００７７】
次に、マッチング処理コンポーネント２２は、格回答候補に対して一致度を計算する。
［回答１に対する一致度］
「検索する」は、表記のみ一致し係り受けの前後も全く一致していないため、０．７／２×１／３となる。
「メール」は、表記のみ一致し係り受けの前後も全く一致していないため、０．７／２×１／３となる。
このため、マッチング指数は、次式のようになる。
（マッチング指数）＝（０．７／２×１／３＋０．７／２×１／３）／２＝０．１１
また、マッチング指数の分子は０．２３である。
【００７８】
［回答２に対する一致度］
「検索する」は、表記のみ一致し係り受けの前後も全く一致していないため、０．７／２×１／３となる。
「メール」は、表記のみ一致し係り受けの前後も全く一致していないため、０．７／２×１／３となる。
このため、マッチング指数は、次式のようになる。
（マッチング指数）＝（０．７／２×１／３＋０．７／２×１／３）／３＝０．０８
また、マッチング指数の分子は０．２３である。
【００７９】
［回答３に対する一致度］
「検索する」は、表記のみ一致し係り受けの前後も全く一致していないため、０．７／２×１／３となる。
「メール」は、表記のみ一致し係り受けの前後も全く一致していないため、０．７／２×１／３となる。
このため、マッチング指数は、次式のようになる。
（マッチング指数）＝（０．７／２×１／３＋０．７／２×１／３）／４＝０．０６
また、マッチング指数の分子は０．２３である。
【００８０】
［回答４に対する一致度］
「検索する」は、表記のみ一致し係り受けの前後も全く一致していないため、０．７／２×１／３となる。
「メール」は、表記のみ一致し係り受けの前後も全く一致していないため、０．７／２×１／３となる。
このため、マッチング指数は、次式のようになる。
（マッチング指数）＝（０．７／２×１／３＋０．７／２×１／３）／３＝０．２３
また、マッチング指数の分子は０．２３である。
【００８１】
［回答５に対する一致度］
「メール」は、表記のみ一致し係り受けの前後も全く一致していないため、０．７／２×１／３となる。
このため、マッチング指数は、次式のようになる。
（マッチング指数）＝（０．７／２×１／３）／３＝０．０４
また、マッチング指数の分子は０．１２である。
【００８２】
以上の計算結果より、マッチング指数の分子は回答１から回答４まで同じであるが、その中でマッチング指数の最も高いものは回答１であるので、回答１を回答とする。即ち、回答１を第１番目の回答候補とする。
【００８３】
図１９は、自然文マッチングシステム１５の動作を示したフローチャートである。
まず、自然文マッチングシステム１５は、ユーザから入力された自然文を取得し（ステップ１）、形態素コンポーネント１７にて、入力文を形態素に分解する（ステップ２）。
形態素コンポーネント１７は、形態素リストを文節解析コンポーネント１８に出力する。文節解析コンポーネント１８は、形態素リストを用いて入力文を文節に区分し、文節リストを生成する（ステップ３０）。
【００８４】
次に、語彙処理コンポーネント１９が文節解析コンポーネント１８から文節リストを取得する。そして語彙処理コンポーネント１９は、語彙辞書２５に登録されている同義語や類義語などの情報を文節リストに付与し、格フレーム処理コンポーネント２０に出力する（ステップ４０）
次に、格フレーム処理コンポーネント２０は、格フレーム辞書２６を用いて、同義語や類義語などの情報が文節に付与された文節リストの文節から、表層格（「を格」、「に格」など）や深層格（「対象格」、「相手格」など）等の格フレーム情報を取得し、文構造を決定する（ステップ５０）。
【００８５】
次に、属性付与コンポーネント２１は、格フレーム処理コンポーネント２０から文構造を取得し、例えば、文中の「細線」に対して「線種（１）」というように、自然文マッチングシステム１５が組み込まれたアプリケーションソフトや装置に特有のパラメータを文構造に付与してマッチング処理コンポーネントに出力する（ステップ６０）。
【００８６】
次に、マッチング処理コンポーネント２２は、訓練コーパス２９を用いて回答辞書２８を作成する。
訓練コーパス２９の格事例文は形態素解析コンポーネント１７などにより解析され、入力文と同様に文構造が作成される。マッチング処理コンポーネント２２は、事例文の文構造を入力文の文構造と比較するための回答検索テーブル６３を回答辞書２８に作成する。
そして、マッチング処理コンポーネント２２は、回答検索テーブル６３を用いて入力文の文構造と、事例文の文構造の一致度を計算し、最も一致度の大きい事例文から求まるコマンドを回答として出力する（ステップ８０）。
【００８７】
以上に説明した本実施の形態に係るコマンド処理装置１により、以下のような効果を得ることができる。
同じ意味や意味的に近い文は１の事例文でカバーできるので、訓練コーパス２９を作成するときに同じ意味の様々な表現の文を用意しなくても良い。このため、訓練コーパス作成のコストが削減できると共に、記憶装置の容量も節約することができる。
また、省略された語句や全く異なる語句が入力文中にあっても意味的に同じ部分が多ければ、マッチングすることにより省略された（又は不足している）語句を推定したり、また、同音異義語などによる音声入力の際の誤認識、誤解析された語句の推定が可能となる。
【００８８】
文節の意味的な情報と係り受けの情報を用いることにより、キーワードマッチングよりも精度の高いマッチングを行うことができる。
また、マッチングに一致度を用いるため、回答の順位付けができる。このため、多数の候補がある場合にユーザに一致度の高いものから回答を提示することができる。
更に、マッチング効率の良い方法を採用することにより事例の数が多くても、一事例ずつマッチングを行っていく場合に比べて処理時間が短くできる。
加えて、アプリケーションソフトを追加した場合は、アプリケーションソフトに依存した辞書（属性辞書と回答辞書）だけを追記すればよいので、少ない辞書サイズでアプリケーションに対応することができる。
また、回答辞書を用いることにより、事例文が多い場合でも効率よくマッチングすることができる。
【００８９】
【発明の効果】
本発明によれば、意味的な情報や係り受けの情報を用いて事例文と入力された自然文（入力文）とを柔軟にマッチングすることができ、更に、事例文の数が多くなった場合でも効率よくマッチングできる自然文マッチング装置、自然文マッチング方法、及び自然文マッチングプログラムを提供することができる。
【図面の簡単な説明】
【図１】図１は、本実施の形態に係る自然文マッチング装置１のハードの構成の一例を示した図である。
【図２】自然文マッチングシステムの構成を示した図である。
【図３】語彙辞書の構成の一例を示した図である。
【図４】同義語テーブルの一例を示した図である。
【図５】類義語テーブルの一例を示した図である。
【図６】多義語テーブルの一例を示した図である。
【図７】同音異義語テーブルの一例を示した図である。
【図８】概念テーブルの一例を示した図である。
【図９】時間テーブルの一例を示した図である。
【図１０】語彙処理で行われる多段処理を説明するための図である。
【図１１】格フレーム情報テーブルの一例を示した図である。
【図１２】格フレーム処理コンポーネントが作成した文構造の例を示した図である。
【図１３】格フレーム処理コンポーネントが作成した文構造の他の例を示した図である。
【図１４】属性テーブルの一例を示した図である。
【図１５】訓練コーパスの一例を示した図である。
【図１６】訓練コーパスの回答の文構造を示した図である。
【図１７】回答検索テーブルの一例を示した図である。
【図１８】回答に対応するコマンドなどを示した図である。
【図１９】自然文マッチングシステムの動作を示したフローチャートである。
【符号の説明】
１自然文マッチング装置
２中央処理装置
３入出力部
４ＲＡＭ
５ＲＯＭ
６バスライン
７記憶部
８プログラム部
９データベース部
１５自然文マッチングプログラム
１６入力情報
１７形態素解析コンポーネント
１８文節解析コンポーネント
１９語彙処理コンポーネント
２０格フレームコンポーネント
２１属性付与コンポーネント
２２マッチング処理コンポーネント
２３出力情報
２５語彙辞書
２６格フレーム辞書
２７属性辞書
３１同義語部
３２類義語部
３３多義語部
３４同音異義語部
３５概念部
４１同義語テーブル
４３類義語テーブル
４５多義語テーブル
４７同音異義語テーブル
４９概念テーブル
５１時間テーブル
５４格フレーム情報テーブル
５７属性テーブル
６０訓練コーパス
６３回答検索テーブル[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a natural sentence matching apparatus, a natural sentence matching method, and a natural matching program, and relates to, for example, an apparatus that interprets the meaning of a natural sentence that has been input by voice or keyboard.
[0002]
[Prior art]
Conventionally, when inputting a command to a computer, it has been necessary to input the command according to a predetermined grammar.
However, recent rapid developments in software and hardware technologies have made it possible to identify commands by interpreting human natural language (natural language).
[0003]
With this technique, for example, when the user requests to utter “Turn off” or “I want to watch 5 channels” on the TV, a microphone built in the TV picks up the user's utterance request.
The natural sentence input by the user is matched with a case sentence built in the TV by a natural matching device, and a command such as “turn off TV” or “tune channel to 5” is specified from the matched case sentence.
[0004]
The natural sentence matching device can obtain an analysis result for an input sentence by comparing an example sentence with a user's input sentence, extracting an answer from the closest case sentence, and making adjustments. As such a thing, there exist the thing by the IIF (Intelligent Interface) system of Unexamined-Japanese-Patent No. 11-31149 public information, the thing matched by a keyword, etc., for example.
[0005]
[Problems to be solved by the invention]
However, in the IIF method, when the predicate verb is unclear, it may not be correctly analyzed.
For example, if you enter "I want to search for a document", I can match, but if the wording is different (such as "inverted in this case"), such as "I want to search for a document", the semantics are the same but I cannot match correctly There is.
In addition, an input sentence “I want to search for Mr. Hirayama's document” can be matched, but there may be a case where the input sentence does not contain a required particle or usage, such as “Hirayama document search”.
[0006]
Further, assuming that a user inputs a sentence by voice input in a natural language, there is a possibility of missing a particle or misanalysis / recognition of a phrase. Furthermore, in spoken language, there are cases where the usage is different from the standard or unnecessary words are included.
For these reasons, there are cases where the input sentence is not a correct particle or one that is not utilized, and this may cause the input sentence to be misanalyzed and recognized.
As described above, in the conventional natural sentence matching device, the input sentence is inverted, the missing particle, the one including the non-standard usage form, and the mis-analyzed or misrecognized phrase. The response to was not enough.
[0007]
In addition, when keyword matching is used, there is a problem that the order of words and dependencies are not taken into consideration.
For example, if the example sentence is “I want to see a sent document”, the keywords are “Send”, “Document”, and “View”. On the other hand, when the input sentence is “send document”, the keyword “document” and “send” and the example sentence “want to see the sent document” are also hit. For this reason, the accuracy of matching is not so high in keyword matching. The case sentence is a natural sentence that is assumed to be input by the user.
[0008]
Therefore, an object of the present invention is to flexibly match a case sentence and an input natural sentence (input sentence) using semantic information and dependency information, and further, there are many case sentences. It is to provide a natural sentence matching device, a natural sentence matching method, and a natural sentence matching program that can be efficiently matched even in the case of becoming.
[0009]
[Means for Solving the Problems]
In order to achieve the above object, according to the present invention, in the invention described in claim 1, a natural sentence acquisition unit that acquires an input natural sentence and a natural sentence acquired by the natural sentence acquisition unit are classified into phrases. The phrase is determined according to the degree of coincidence between the natural phrase segmentation means for generating the phrase list, the phrase list generated by the natural phrase segmentation means, and the case frame in which the dependency information of the phrase is stored in advance. Natural sentence structure acquisition that obtains sentence structure of the natural sentence by obtaining the rank dependency means for ranking and the dependency information of the clauses classified by the natural sentence phrase classification means by the surface case and the deep case The dependency information coincides with the case sentence structure acquisition means for acquiring the sentence structure by acquiring the phrase dependency information in the surface case and the deep case with respect to the means and the case sentence associated with the answer. Clause Using the natural sentence structure acquisition means, the coincidence degree acquisition means for acquiring the degree of coincidence between the sentence structure acquired by the case sentence sentence structure acquisition means, and the coincidence degree acquisition means A natural sentence matching device comprising: an answer specifying means for specifying the answer using the obtained degree of matching, wherein the degree of matching obtaining means ranks the ranking means when counting the number of phrases of matching phrases. In Therefore, clauses ranked in ranks with a low degree of coincidence between clause list and case frame Provided is a natural sentence matching device characterized by adjusting the number of phrases low.
In the invention according to claim 2, the ranking means compares whether the phrase list acquired by the natural phrase segmentation means matches the case frame related to the phrase in both the surface case and the notation, and does not match. If either of the surface case or notation is matched, and if either of the surface case or notation matches, the phrase is more than the case where both the surface case and notation match. The natural sentence matching apparatus according to claim 1, wherein the rank is reduced.
In the invention according to claim 3, when the degree of coincidence acquisition unit counts the number of phrases, the phrases before and after the dependency of the natural sentence and the phrases before and after the dependency of the case sentence If there is only a match before and after the dependency, , The number of clauses is lower than when both before and after the dependency match, If both before and after dependency do not match , Than if only before or after the dependency The natural sentence matching apparatus according to claim 1, wherein the number of phrases is adjusted to be lower.
In the invention according to claim 4, the coincidence degree acquisition unit includes the sentence structure acquired by the natural sentence structure acquisition unit and the sentence structure of the case sentence acquired by the case sentence structure acquisition unit. 4. The natural sentence matching apparatus according to claim 1, wherein the matching degree is obtained by matching using at least one of a surface case and a deep case.
The invention according to claim 5 further comprises vocabulary information adding means for adding vocabulary information to a phrase included in the clause classified by the natural phrase segmenting means, and the matching degree acquisition means includes the vocabulary information adding 5. The matching level is obtained by matching the sentence structure of the natural sentence and the sentence structure of the case sentence using the vocabulary information given to the phrase by the means. A natural sentence matching device according to any one of the claims is provided.
In the invention according to claim 6, the vocabulary information providing means adds at least one of a synonym, a synonym, a polysemy, a homonym, and conceptual information corresponding to the phrase to the phrase included in the phrase. The natural sentence matching apparatus according to claim 5, wherein the natural sentence matching apparatus is associated.
In the invention according to claim 7, the case sentence structure acquisition means includes a case sentence acquisition means for acquiring the case sentence, and a case sentence phrase classification for dividing the case sentence acquired by the case sentence acquisition means into phrases The natural sentence matching device according to any one of claims 1 to 6, wherein the natural sentence matching device is provided.
In the invention according to claim 8, the coincidence degree acquisition means uses the case sentence structure acquired by the case sentence sentence structure acquisition means to convert a phrase included in the case sentence into a surface case or a deep case of the phrase. A table creating means for creating a table for matching the sentence structure of the natural sentence and the sentence structure of the case sentence using at least one of A natural sentence matching device according to any one of the claims is provided.
In the invention according to claim 9, the natural sentence acquisition means, the morpheme analysis means, the natural sentence clause classification means, the ranking means, the natural sentence structure acquisition means, the case sentence sentence structure acquisition means, the degree of coincidence In a computer comprising analysis means and answer identification means,
A natural sentence storage step for storing the natural sentence input from the input device in the memory by the natural sentence acquisition means, and a morpheme string by performing a morphological analysis on the natural sentence stored in the natural sentence storage step by the morpheme analysis means. A morpheme analysis step for storing the generated morpheme sequence in a memory, and the natural sentence phrase segmentation unit generates a phrase list by combining the independent words and the attached words of the morpheme string stored in the morpheme analysis step. The phrase analysis step of storing the generated phrase list in a memory, and the ranking means stores the phrase list stored in the phrase analysis step and the case frame stored in the storage device. A ranking step of analyzing and ranking the clauses and storing the ranks in a memory; and obtaining the natural sentence structure By analyzing the sentence structure of the natural sentence by specifying the dependency information of the phrase in the phrase list stored in the phrase analysis step by the surface case and the deep case, the result of the analysis was obtained. The natural sentence structure analysis step for storing the sentence structure in a memory, and the case sentence structure acquisition means divides the case sentence associated with the answer into phrases, and the dependency information of the classified phrases is a surface case, The sentence structure analyzed in the deep case and the sentence structure obtained as a result of the analysis stored in the memory, and the sentence structure stored in the natural sentence structure analysis step by the matching degree analysis means And the sentence structure stored in the memory in the case sentence structure analysis step is analyzed using the number of clauses of the phrase having the same dependency information, and the analyzed match is stored in the memory. A natural sentence matching method comprising: an analysis step; and an answer specifying step for specifying the answer using the matching degree stored in the memory in the matching degree analyzing step by the answer specifying unit, wherein the matching degree analyzing step The rank stored in the memory in the ranking step when counting the number of phrases that match The clauses ranked by the rank with the low degree of matching between the clause list and the case frame Provided is a natural sentence matching method characterized by adjusting the number of phrases low.
In a tenth aspect of the present invention, a natural sentence input from an input device is stored in the memory by being loaded into the memory of the computer and executed by the CPU, and the natural sentence storage function stores the natural sentence. The morphological analysis is performed on the natural sentence to generate a morpheme string, the morpheme analysis function for storing the generated morpheme string in a memory, and the independent words and the adjuncts of the morpheme string stored by the morpheme analysis function are combined. A clause list is generated, and the clause analysis function for storing the generated clause list in a memory, the clause list stored by the clause analysis function, and the case frame stored in the storage device are analyzed for the degree of coincidence. Rank the clauses and store the rank in a memory; and clauses in the clause list stored by the clause analysis function A natural sentence structure analysis function for analyzing the sentence structure of the natural sentence by specifying dependency information in a surface case and a deep case, and storing the sentence structure obtained as a result of the analysis in a memory; A case sentence structure that divides the case sentence associated with the sentence into clauses, analyzes the dependency information of the divided phrases in a surface case and a deep case, and stores the sentence structure obtained as a result of the analysis in a memory The degree of matching between the sentence structure stored in the memory by the analysis function and the natural sentence structure analysis function and the sentence structure stored in the memory by the example sentence structure analysis function The computer realizes a coincidence analysis function for analyzing and using the degree of coincidence stored in the memory and an answer specifying function for identifying the answer using the coincidence stored in the memory by the coincidence analysis function. Natural sentence matching professional A ram, when counting the number of clauses clause match in the matching degree analysis function, rank stored in the memory The clauses ranked by the rank with the low degree of matching between the clause list and the case frame Provided is a natural sentence matching program characterized by adjusting the number of phrases low.
[0010]
With the above configuration, by using semantic information and dependency information, it is possible to determine that sentences that are semantically the same are similar, and misrecognition or misanalysis (such as homonyms) Even if the phrase is omitted, it is possible to flexibly match the case sentence and the natural sentence.
[0011]
DETAILED DESCRIPTION OF THE INVENTION
In the present embodiment, matching is optimized by matching a natural sentence and a case sentence in consideration of the surface layer and the deep layer. In addition, we are trying to improve the efficiency of software for matching with many cases during matching.
[0012]
Considering the deep layers of natural sentences (the semantic structure of natural sentences), it is possible to more flexibly match natural sentences and case sentences.
For example, “Send an email to Mr. Tanaka” and “Send an email to Mr. Tanaka” which inverts this correspond to the example sentence “Send an email to Mr. Tanaka”.
Conventionally, in order to make a case sentence correspond to a natural sentence entered in the form of inversion, it is necessary to be able to cope with the case sentence in the case of inversion, for example, by preparing a case sentence in the form of inversion. there were.
However, considering the depth, it is possible to match “send an email to Mr. Tanaka” to “send an email to Mr. Tanaka”, which can make the case sentence dictionary more efficient.
In addition, even when the natural sentence is incomplete, it is possible to appropriately match the use of the deep structure.
[0013]
Hereinafter, embodiments of the present invention will be described in detail with reference to FIGS.
FIG. 1 is a diagram illustrating an example of a configuration of a natural sentence matching apparatus 1 according to the present embodiment.
The natural sentence matching device 1 includes a central processing unit 2, an input / output unit 3, a RAM (Random Access Memory) 4, a ROM (Read Only Memory) 5, a storage unit 7, and the like connected by a bus line 6. .
[0014]
The input / output unit 3 is connected to the input / output device, and is a part that inputs / outputs information to / from the outside of the natural sentence matching device 1.
When the natural sentence matching device 1 is equipped, for example, on a television, the input / output unit 3 uses a microphone for acquiring a user's utterance request (instructions from the user composed of natural sentences to the television) or a matching result. An interface for transmitting the obtained information to the control unit of the television.
Further, when the natural sentence matching device 1 is equipped in a personal computer, for example, a microphone or a keyboard for acquiring a request by a user's natural sentence, a display for presenting a response sentence to the user, or the like.
[0015]
The ROM 5 is a read-only memory, and stores a basic program for operating the natural sentence matching device 1.
The RAM 4 is a readable / writable memory, and provides a working memory for the central processing unit 2 and loads and stores programs and data stored in the storage unit 7.
[0016]
The storage unit 7 is configured by, for example, a hard disk or other nonvolatile memory. The storage unit 7 includes a program unit 8 that stores a natural sentence matching program and other programs, a vocabulary dictionary used when matching natural sentences, a case frame dictionary, an attribute dictionary, and a database 9 that stores other data. It is configured.
Further, when the natural sentence matching device 1 is used for, for example, a voice word processor, word processing software can be stored in the program unit 8.
[0017]
The central processing unit 2 performs a natural sentence matching process according to a program stored in the ROM 5 or the RAM 4, or controls data input / output at the input / output unit 3.
The bus line 6 is a transmission medium used when data is transmitted and received between the central processing unit 2 and other parts such as the input / output 3.
[0018]
FIG. 2 is a diagram showing a configuration of the natural sentence matching system 15.
The natural sentence matching system 15 is realized by software by loading a natural sentence matching program stored in the program unit 8 of FIG.
Moreover, each component of the natural sentence matching system 15 can also be comprised with hardware.
[0019]
Hereinafter, an outline of the processing contents of each component of the natural sentence matching system 15 will be described. Specific processing examples of each component will be described later.
First, an input sentence input by a user from a microphone or a keyboard is input to the morphological analysis component 17. This input sentence is composed of a natural sentence that is a human natural language.
[0020]
The morpheme analysis component 17 performs morpheme analysis of the input natural sentence and outputs the result to the clause analysis component 18 as a morpheme list. A morpheme is a word that is further subdivided into phrases and is divided into independent words and attached words. The phrase analysis component 18 creates a phrase list from the morpheme list. The creation of a phrase is basically a phrase that combines independent words and ancillary words in the morpheme list.
Since it is necessary when processing concepts such as person names and place names later, information such as specific numerical values, person names and place names obtained from the result of morphological analysis is also given to the phrase list.
In addition, normalization processing of alphabets, katakana, symbols, etc. is also performed. Note that the normalization process is a process for aligning full-width, half-width, uppercase and lowercase letters of alphabets, and Kanji variants to a certain constant. For example, consider the process of aligning alphabets to half-width lowercase letters, half-width katakana characters to full-width characters, and variant characters to regular kanji characters. Full-width characters “Katakana” can be converted to “Watanabe”.
[0021]
The vocabulary processing component 19 acquires a phrase list from the phrase analysis component 18 and assigns semantic information to the phrase list using the vocabulary dictionary 25. Semantic information includes, for example, synonyms, synonyms, polysemy, homonyms, concept information, and the like. These pieces of information are stored in the vocabulary dictionary 25 as a table.
The concept information includes not only colors such as red and blue, directions such as west and east, but also special concepts such as place names and personal names. As will be described later, in the present embodiment, a special concept is used so that a numerical value, a person name, a place name, and the like can be conceptually processed during morphological analysis. In addition, as will be described later, expressions related to time such as 9:20 can be included in the concept.
[0022]
The case frame processing component 20 acquires a phrase list to which meaning is given from the vocabulary processing component 19, and determines a phrase that is considered to be an object for the verb from the surface case and the concept. At that time, deep case information can be added to the phrase list.
For example, if the input sentence is “send mail to Hosokawa”, the verb is “send”, and the object for this verb is “surface case” and “deep case”. "Mail".
The object of a normal verb is written in the form of “to”, which is called “to case” in the surface case. In addition, the verb object is semantically the target of the action of the verb, so it is called the “target case” in the deep case.
[0023]
The case frame dictionary 26 stores case frames corresponding to various words. With the case frame, for example, the phrase “send” has a frame (structure) of “send to ~” in the surface case and “send (target case) to (partner)” in the deep case, The "personal case" and the "partner case" correspond to the concept of personal names, and the "personal case" and "target case" correspond to emails, letters, etc. . Details will be described later.
The case frame processing component 20 determines a phrase that is considered to be an object of the phrase list, and then refers to the case frame dictionary 26 to determine how much the input sentence matches the case frame.
[0024]
When determining which case frame the input sentence matches from the case frame information in the case frame information dictionary 26, information may be insufficient or vocabulary information may be insufficient. In such a case, for example, the processing is performed with relaxed matching conditions such as matching only with deep case information.
In this way, by loosening the matching conditions, matching can be performed as it is even when matching is inherently difficult.
[0025]
When matching the input sentence with the case frame, if the case and concept (or notation) match, rank 1; if there is no rank 1, only the concept (or notation) matches, or surface case Look for matches that only match and rank this.
If there is no rank 1 or rank 2, general dependency information is adopted, and this is set as rank 3. The general dependency information is information such that “to case” relates to a verb, and “to case” relates to a verb or a sabot noun.
As a result of the case frame processing, the case frame processing component 20 generates a sentence structure having deep case information and case rank information of the case frame.
[0026]
The attribute assignment component 21 assigns information depending on application software or a device using the natural sentence matching system 15, for example, command parameter information, to the sentence structure (sentence) information. These pieces of information are stored in the attribute dictionary 27.
When the special concept is an attribute, the value of the special concept can be used as the attribute value as it is. For example, an input value can be used as an attribute for a person name, a place name, a numerical value, a time, or the like processed by the concept.
[0027]
The sentence structure acquired by the matching processing component 22 includes vocabulary information, case frame information, rank when matching with the case frame, attribute information, and the like.
The matching processing component 22 obtains the matching degree between the input sentence and the case sentence, and performs a process of adopting the answer of the case sentence having a high matching degree as an answer. The degree of coincidence between the input sentence and the example sentence is obtained by calculating how much the two sentence structures match, and is calculated from the phrase information and the dependency information.
[0028]
More specifically, the sentence structure to which the attribute is given from the attribute assignment component 21 is received, and from the case sentence of the training corpus (a combination of the natural language assumed to be input by the user and the corresponding answer). A case sentence and an input sentence are matched using the created answer dictionary 28. Then, a case sentence having a high degree of coincidence is specified, and an answer obtained therefrom is output as output information 23.
The answer is a set of commands specified from case sentences and the like, parameters assigned by the attribute assignment component 21, and the like. That is, when a case sentence is specified by matching, a command and a parameter are specified.
[0029]
Of the various dictionaries of the natural sentence matching system 15, the vocabulary dictionary 25 and the case frame dictionary 26 are a dictionary of general linguistic knowledge, and application software and devices to which output information 23 is output are different. Can be used in common.
On the other hand, since the attribute dictionary 27 and the answer dictionary 28 depend on application software and devices, one is created for each application software or device.
As described above, the natural sentence matching system 15 divides the dictionary into a general linguistic part and a part depending on the application software or the device, so that it is added even when the new application software or device is supported. The dictionary size can be reduced.
[0030]
FIG. 3 is a diagram showing the configuration of the vocabulary dictionary 25.
The vocabulary dictionary 25 includes a synonym part 31 in which synonyms are registered, a synonym part 32 in which synonyms are registered, a multi-synonym part 33 in which polysynonyms are registered, a homonym part 34 in which synonyms are registered, and time and personal names, for example. It is composed of a concept part 35 and the like in which concepts are registered. The contents of these units will be described together with the following processes.
[0031]
Next, a specific example of analysis performed by the vocabulary processing component 19 will be described.
[Synonym analysis example]
FIG. 4 shows a synonym table 41 created in the synonym part 31.
The synonym section 31 is a collection of phrases (synonyms) representing the same meaning, and each synonym is tabulated in association with a representative word representing them.
For example, “buy” is synonymous with “buy” and “use” is synonymous with “use”.
For example, suppose that the input sentence is “use computer”. First, the morphological analysis component 17
Morphological analysis → PC / Use / Use / U
And then by the phrase analysis component 18
Sentence analysis → use / use computer
And is divided into clauses. The vocabulary processing component 19 then
Use / use computer (synonyms = use)
Thus, a synonym representative word (used here) is given to the phrase list.
[0032]
[Synonym analysis example]
FIG. 5 shows a synonym table 43 when “search” and “use” are representative words.
The synonym section 32 stores synonym information for processing expressions that are semantically similar but not completely the same.
For example, “search”, “search”, “search”, and “search” are synonyms of each other, and “use”, “use”, “use”, and “use” are synonyms of each other.
In the synonym section 32, one representative word is selected from the synonyms (for example, the most commonly used word / phrase is selected), and this is tabulated and registered as synonym information.
[0033]
For example, when the input sentence is “use personal computer”, the synonym is analyzed as follows in the same manner as the case of the synonym.
Morphological analysis → PC / Use / Use / Yes
Phrase analysis-> use / use computer
Vocabulary processing → use / use computer (synonyms = use)
In this way, synonym information is added to the phrase list by vocabulary processing.
[0034]
[Example of polysemy analysis]
It is the figure which showed an example of the multiple meaning word table 43 of FIG.
An ambiguous word is a phrase having a plurality of meanings. For example, the verb “subtract” includes a plurality of meanings such as “subtract” and “pull”.
In the ambiguity section 33, these ambiguities are registered as a table.
For example, when the input sentence is “draw a line”, the ambiguous word is analyzed as follows.
Morphological analysis-> line / draw / draw
Phrase analysis-> draw line /
Vocabulary processing-> draw / draw line (polysemy = subtract, draw)
In this way, information on polysemy is given to the phrase list.
[0035]
[Analysis example of homonyms]
FIG. 7 is a diagram showing an example of the homonym table 47.
Processing for homonyms is performed by adding homonym information to words that may be interpreted as different meanings in speech recognition.
Examples of homonyms include “contrast”, “target”, “symmetry”, “multiply”, “miss”, “draw”, “write”, and the like.
In the homonym section 34, these homonyms are registered as a table.
[0036]
For example, when the input sentence is “target figure”, the homonym is analyzed as follows.
Morphological analysis-> Figure / To / Target / To
Phrase analysis-> / target / figure
Vocabulary processing → figure / target (homonyms = contrast, symmetry) /
In this way, information on homonyms is given to the phrase list. By adding the homonym information, for example, in the case of voice input, the possibility of erroneous analysis can be reduced.
[0037]
[Example of conceptual information analysis]
FIG. 8 is a diagram showing an example of the concept table 49.
In the concept unit 35, for example, as shown in the concept table 49, it is registered as a table so that the same semantic contents such as colors and directions can be handled together. For example, the concept information of “up”, “down”, “right”, and “left” is “direction”, and the concept information of “red”, “green”, and “blue” is “color”.
Special concepts include person names, place names, numbers, and time. Among these special concepts, personal names, place names, and numerical values are created based on information given at the time of morphological analysis, and the time is a time value from a time table 51 registered in a time concept dictionary (not shown). create. FIG. 9 is a diagram showing the time table 51.
As shown in FIG. 9, a special concept can also have a value. Also, items that span multiple clauses are separated by commas.
[0038]
For example, when the input sentence is “move up”, the conceptual information is analyzed as follows.
Morphological analysis → Up / To / Move / Move
Phrase analysis-> Move up / down
Vocabulary processing → Up (concept = direction) / Move
[0039]
In addition, when the input sentence includes time such as “search for a file updated on August 3”, the concept is analyzed as follows.
Morphological analysis → 8 / month / 3 / day / to / update / do / ta / file / search / search /
Phrase analysis-> 8 / month / 3 / day / updated / file / search
Vocabulary processing-> 8 (concept = numerical value 8) / month / 3 (concept = numerical value; 3) date / updated / search file
Thus, a numerical concept can have a value.
In addition, a plurality of phrases (8 / month / 3 / day) can be processed as follows.
Vocabulary processing → 8 / month / 3 / day (concept = time; 2000/8/3) / updated / search / file
[0040]
Further, when the input sentence includes a personal name that is a special concept such as “search for documents of Hirayama”, it is analyzed as follows.
Morphological analysis-> Hirayama / / / document / / / search
Phrase analysis-> search / search Hirayama
Vocabulary processing → Hirayama's (concept = person name; Hirayama) / search / documents
Thus, the personal name concept and the place name concept can have values.
[0041]
[In the case of multiple clauses and multi-step processing]
The vocabulary processing component 19 can process what consists of a plurality of clauses in the same manner as the analysis of the time concept, and can give information such as synonyms and synonyms from the synonym part 31 and the synonym part 32.
For example, when the input sentence is “3 pages on A4 paper”, it is analyzed as follows.
Morphological analysis-> A4 / paper / to / 3 / page / from / out
Sentence analysis-> A4 / on paper / 3 / from page
Vocabulary processing-> A4 / on paper (synonyms = paper) / 3 (concept = numeric value; 3) / from page / put out (paper + issue-> synonyms = print)
The vocabulary processing component 19 further processes “paper” and “put out” as synonym processing results to give “print”.
This is because the synonym section 31 is given the synonym information “paper” to “paper” by the synonym table shown in FIG. 10A, and the synonym table shown in FIG. The synonym information “print” is added to “N” and “Put”.
[0042]
Next, a specific example of analysis performed by the case frame processing component 20 will be described.
FIG. 11 is a diagram showing an example of the case frame information table 54 stored in the case frame dictionary 26.
For example, the phrase “send” (predicate) is usually used in the form of “send (target) to (partner)”. The clause corresponding to (partner) is called “Ni case” in the superficial case and “partner case” in the deep case, and the clause applicable to (target case) is “in case” in the surface case and “target” in the deep case. It is called “case”.
In the case frame information table 54, “mail” and “letter” are registered as notations corresponding to “send” and “target case” of “send”, and correspond to “ni case” and “partner case”. “Personal name” is registered as concept information.
The case frame processing component 20 matches the lexical processed phrase list with the case frame information stored in the case frame dictionary 26, and ranks the degree of matching.
[0043]
For example, when the input sentence is “Forward mail sent to Mr. Hosokawa”, the morphological analysis to vocabulary processing are analyzed in the following procedure.
Morphological analysis-> Hosokawa / san / to / sent / sent / email / forward
Sentence analysis → sent / sent / forwarded to Mr. Hosokawa
Vocabulary processing → Mr. Hosokawa (concept = personal name) / sent / mail forwarded
[0044]
The case frame processing component 20 matches the phrase list processed with the vocabulary as described above and the case frame information in the following steps.
Step 1: Use the case frame information table 54 to obtain information on the case frame obtained from the “forward” clause.
Regarding the target case, referring to the “target case” column in the “send” column of the case frame information table 54 of FIG. 11, “mail” is sent to “sent”, and the notation (mail) matches. This match is ranked 1. ... (1)
Regarding the opponent's personality, if you refer to the "partner's personality" field in the "Send" field, it will be "Mr. Hosokawa (concept = personal name)" → "sent", which matches the concept (person name). Rank 1 ... (2)
[0045]
Step 2: Get case frame information obtained from the phrase “Mail”.
Since there is no corresponding word in the “word” column of the case frame information table 54, there is no case frame information obtained from this phrase. In the “word / phrase” column, words / phrases that can be predicates are registered, and mail is a noun, and is not in this column.
[0046]
Step 3: Case frame information obtained from the phrase “Sent” is acquired.
Referring to the “partner's case” field in the “send” field of the case frame information table 54, “Mr. Hosokawa (concept = person name)” → “sent” and matches by concept, so this match is ranked 1 . ... (3)
[0047]
Step 4: Examine clauses that are not subject to dependency.
A clause without dependency is "sent". In other words, there is no word that is in the position before "Sent" and becomes the opponent's personality or target of "Sent".
On the other hand, “sent” is a verb combination form, that is, a form in which body words (nouns and pronouns) are connected, and therefore relates to a noun, a sabot noun or an unregistered word.
Here, a general dependency is adopted and “sent” → “mail” is adopted. “Sent” is a modifier of “email” and is ranked 3. ... (4)
[0048]
From (1) to (4), the one with the highest rank and the one with the short phrase distance are adopted as dependency information (in this example, (1), (3) and (4) are adopted), and the sentence structure Create FIG. 12 shows a sentence structure created by the case frame component 20.
As shown in FIG. 12, the meaning of the natural sentence input from the user is “forwarding” “mail”, and the “mail” is “sent to Mr. Hosokawa”. Become.
[0049]
In addition, when the input sentence is “forward the mail sent to the east”, it is analyzed as follows.
Morphological analysis → East / To / Send / Send / Mail / To / Forward
Sentence analysis → East / Sent / Email / Forward
Vocabulary processing → East (conceptual information = direction) / sent / mail sent / forwarded
Case frame processing is performed as follows.
Step 1: Acquire case frame information obtained from the “forward” clause.
From the “target case” column in the “forward” column of the case frame information table 54, “mail” → “forward”, which match in the notation, this match is ranked 1. ... (1)
[0050]
Step 2: Get a case frame from the phrase “Mail”. There is no case frame obtained from the phrase “Mail” from the case frame table 54.
Step 3: Get a case frame obtained from the phrase “Sent”. There is no case frame obtained from the phrase “sent” from the case frame information table 54.
In other words, in the clause before "Sent" (because the clause related to "Sent" should be before "Sent"), the target case (email, letter) and the opponent's case (concept) There is no clause corresponding to any of (information = personal name).
[0051]
Step 4: Process words with no dependency.
The phrase without dependency is “To the East”. In this case, it is assumed that the matching condition is relaxed so that any of the surface case, notation, and concept do not match.
From the “partner” column in the “send” column, “east” (conceptual information = direction) ”→“ sent ”and the notation matches“ ni case ”, but the concept does not match, so this match is ranked 2 It is. ... (2)
Also, from the “Transfer” column, “East (conceptual information = direction)” to “Transfer” from the “Partner case” column, the notation matches with “Ni case”, but the concept information does not match. The match is also rank 2. ... (3)
[0052]
Furthermore, the phrase where the dependency is not generated is “sent”. Since “sent” is a verb form, it is related to “noun, sa-noun, unregistered word”. Here, a general dependency is adopted and “sent” → “mail” is adopted. “Sent” is a modifier, and in this case rank 3. ... (4)
The case frame component 20 creates a sentence structure by adopting (1) to (4) that has a high rank and that has a short phrase distance. FIG. 13 shows the created sentence structure. That is, the meaning of the input sentence of the user is to “forward” “mail”, and the “mail” is “sent” to “east”. However, it is not certain whether “East” is a personal name.
[0053]
Next, a specific example of processing performed by the attribute assignment component 21 will be described.
Information relating to values depending on parameters of application software using the natural sentence matching apparatus 1 or apparatus is added to the sentence structure created by the case frame processing component 20.
Concept information such as a person name, place name, numerical value, and time can be used as an attribute as it is. Further, when these pieces of concept information are used as attributes, the value of the concept information acquired when the vocabulary processing component 19 processes the concept information can also be used as the attribute value. In addition, when the attribute information is processed by the attribute assignment component 21, a method of acquiring the value of the concept information independently may be employed.
[0054]
FIG. 14 is a diagram showing an example of the attribute table 57 stored in the attribute dictionary 27. In the attribute table 57, a phrase, concept information representing the concept of the phrase, and parameters corresponding to the phrase are stored in pairs. For example, “thin line” is “line type” as conceptual information, and the thin line is represented by parameter 1 of “line type”. That is, “thin line” is represented by “line type (1)”.
Similarly, the phrase “red” corresponds to “color (0x0000ff)”.
[0055]
For example, assume that the input sentence is “draw a thin line”. This input sentence is processed in the following procedure.

Thus, the attribute assignment component 21 assigns attribute information (in this case, line type attribute = 1) to the sentence structure acquired from the case frame information component 20 with reference to the attribute dictionary 27.
[0056]
If the input sentence is “draw a thin line”, it is processed as follows.
Morphological analysis-> thin / thin / line / draw / draw
Sentence analysis → Thin / Draw / Draw
Vocabulary processing → Thin / Draw / Draw
[Formula 1]

Thus, attribute information can also be acquired from a plurality of clauses.
[0057]
Next, a specific example of processing performed by the matching processing component 22 will be described.
The matching processing component 22 compares the sentence structure of the case sentence in the training corpus with the sentence structure acquired from the attribute assignment component 21 to calculate the degree of coincidence.
The degree of coincidence is a measure of how well the two sentence structures match. For example, the matching index is calculated from the number of clauses in the answer candidate and the number of clauses that match the answer candidate, as shown in the following formula. It is done.
(Matching index) = (number of matched phrases) / (number of answer candidate segments)
When determining the ranking of answer candidates, priority is given not only to the matching index but also to those having a high “matched phrase” as shown in the following formula.
(Number of matched phrases)> (Matching index)
That is, the answer with the largest number of matched clauses (the numerator of the matching index) is adopted as a matching result, and when the number of matched phrases is the same, it is determined by the size of the matching index.
[0058]
It should be noted that the number of clauses with matching indices is 0.3 as a measure of how much the clauses match, 0.3 when matching in deep case, 0.7 when matching in notation / concept (the ones processed with synonyms are 0.4).
[0059]
Further, adjustment is made such that the value of the number of clauses is halved for the case frame whose rank is low (for example, rank 3).
Also, to reflect the dependency information of the clause, if the clauses before and after the dependency match those of the answer candidates, the number of clauses is left as it is, and the dependency Before and after If only one of these matches, the number of clauses is further reduced to 2/3. If both before and after the dependency do not match, the number of phrases is reduced to 1/3.
[0060]
Specific examples of matching are shown below.
A training corpus consisting of answers 1 to 5 as shown in FIG. 15 is assumed. Answer 2 contains the concept of time, and answers 3 and 5 contain the concept of person names.
Note that parameter 1 in the search command represents the type of search item, and when “1 = 4”, it is an email. Parameter 2 represents the sender name, parameter 3 represents time, parameter 5 represents the search direction, and when “5 = 1”, the search is in descending order (downward). Parameter 1 in the send command represents the destination.
For each case sentence in the training corpus 60 shown in FIG. 15, morphological analysis, phrase analysis, vocabulary processing, case frame processing, and processing up to attribute assignment are performed to create these sentence structures.
[0061]
FIG. 16 shows the sentence structure obtained by analyzing the case sentence of the training corpus 60. From top to bottom, they correspond to Answer 1 to Answer 5 in order.
Next, an answer dictionary for answer search necessary for matching is created.
In the present embodiment, the answer dictionary 28 is created at the time of matching, but it may be created and stored in advance for various example sentences.
For the answer search, an answer dictionary 28 in which the sentence structure of the example sentence and answer candidates are linked is used.
FIG. 17 is an example of an answer search table 63 created in the answer dictionary 28. This is created from the sentence structure shown in FIG.
The answer search table 63 is a table in which words / phrases of case sentences are disassembled based on a sentence structure and summarized for each word / phrase (notation or attribute).
[0062]
In the “notation / attribute” column, a notation or attribute appearing in the training corpus 60 is recorded.
In the “deep case” column, the deep case of these notations or attributes is recorded. In the “surface case” column, the surface case of these notations or attributes is described.
State 1, state 2,... In the “state” column are notations for distinguishing notations or attributes classified by the deep case or the surface case. The “answer candidate” column is a reply including the notation or attribute that is the basis of each state. The “continuous phrase” indicates the phrase to which the notation or attribute is connected in each answer as a state.
[0063]
FIG. 18 is a diagram showing each candidate answer and the commands, parameters, conditions, and the like corresponding to those answers.
For example, in response 1, the corresponding command is “search” and the value of parameter 1 is 4. The condition for selecting the answer 1 is a state in which the sentence structure of the input sentence includes the notation “search” as a predicate, that is, the state 1 and the state 2.
[0064]
As described above, a specific example of matching performed using the answer dictionary 28 created from the training corpus will be described.
When the input sentence is “search for mail”, the sentence structure to which the attribute is given is created by each component from the morphological analysis component 17 to the attribute assignment component 21 as follows.
(Predicate) search
(Target case) Email (case)
[0065]
Next, answer candidates corresponding to the “search” predicate are searched from the “search” column of the answer search table 63. As a result, answers 1 to 4 are candidates.
Next, answer candidates corresponding to “mail” of the target case are searched from the “mail” column of the answer search table 63. As a result, answers 1 to 5 are candidates.
From the above, there can be answers 1 to 5 as answer candidates.
[0066]
Next, the degree of coincidence for each answer candidate is calculated.
[Degree of match for answer 1]
The matching index is as follows:
(Matching index) = 2/2 = 1.00
[Degree of match for answer 2]
Since there is nothing related to the “Mail” clause, the “Mail” clause is 1 × 2/3. As a result, the matching index is as follows.
(Matching index) = (1 × 2/3 + 1) /3=0.56
[0067]
[Degree of agreement with answer 3]
Since there is nothing related to the “Mail” clause, the “Mail” clause is 1 × 2/3. As a result, the matching index is as follows.
(Matching index) = (1 × 2/3 + 1) /4=0.42
[Degree of match for answer 4]
Since there is nothing related to the “search” clause, the “search” clause is 1 × 2/3, and as a result, the matching index is as follows.
(Matching index) = (1 × 2/3 + 1) /3=0.56
[0068]
[Degree of agreement with answer 5]
The section “Mail” is 1 × 1/3 because the before and after dependency does not match. As a result, the matching index is as follows.
(Matching index) = (1 × 1/3) /3=0.11.
As a result of the above, the command of the answer 1 having a high coincidence of the answer 1 is set as the answer.
[0069]
Next, matching when the input sentence is “Search Hosokawa's mail” will be described.
First, the sentence structure to which attribute information is added is analyzed by each component from the morphological analysis component 17 to the attribute addition component 21.
In the case of “Search Hosokawa's mail”, it is analyzed as follows.
(Predicate) search
(Target case) Email (case)
(????) Mr. Hosokawa (No.) (person name = Hosokawa)
[0070]
The deep case for the phrase “Mr. Hosokawa's” was set as (????), because there is no case frame for "Search" in the case of (case) or concept (person name). This is because a general dependency is adopted because there is nothing.
In this case, it is assumed that there is no case frame information as follows.
(Phrase) Search, (Surface Case) Case, (Deep Case) Target Case, (Concept / Notation) * Person Name Next, search for the predicate “search” from the “search” column of the answer search table 63. As a result, answers 1 to 4 are candidates.
Next, “mail” of the target case is searched from the “mail” column of the answer search table 63. As a result, answers 1 to 5 are candidates.
Next, the limited search “Mr. Hosokawa” or “* person name” is searched from the answer search table 63. As a result, there are no answer candidates.
Next, it ignores the conditions of the deep case and the surface case and searches for “Mr. Hosokawa”.
Next, search for “* person name” ignoring the conditions of deep case and surface case. As a result, answers 3 and 5 are candidates.
From the above results, answers 1 to 5 are answer candidates.
[0071]
Next, the matching processing component 22 calculates the degree of matching for each answer candidate.
[Degree of match for answer 1]
The matching index is as follows:
(Matching index) = (1 × 2/3 + 1) /2=0.83
The numerator of the matching index is 1.67.
[Degree of match for answer 2]
(Matching index) = (1 × 2/3 + 1) /3=0.42
The numerator of the matching index is 1.67.
[0072]
[Degree of agreement with answer 3]
The section “Mr. Hosokawa” has a case rating of 0.7 / 2 × 2/3 because the deep case does not match and the rank of the case frame is 3, and the contact point does not match. The phrase is 1 × 2/3 because the phrase does not match. Therefore, the matching index is as follows:
(Matching index) = (0.7 / 2 × 2/3 + 1 × 2/3 + 1) /4=0.48
The numerator of the matching index is 1.90.
[Degree of match for answer 4]
Since there is nothing related to the phrase “search”, the phrase “search” is 1 × 2/3. Therefore, the matching index is as follows:
(Matching index) = (1 + 1 × 2/3) /3=0.56
The numerator of the matching index is 1.67.
[0073]
[Degree of match for answer 4]
The phrase “Mr. Hosokawa's” is 0.7 / 2 × 2/3 because the deep case does not match and the rank of the case frame is 3, and the relations do not match. In addition, the phrase “in the mail” is 1 × 1/3 because the relations before and after do not match. Therefore, the matching index is as follows:
(Matching index) = (0.7 / 2 × 2/3 + 1 × 1/3) /3=0.19
The numerator of the matching index is 0.57.
As a result, the answer 3 is the one with the largest numerator of the matching index. That is, answer 3 is the first answer candidate.
[0074]
Next, a description will be given of matching in the case where the input sentence is an inverted type such as “I want to search for mail”.
First, the sentence structure to which attribute information is added is analyzed by each component from the morphological analysis component 17 to the attribute addition component 21.
In the case of “I want to search for mail”, it is analyzed as follows.
(Predicate) None
(???) Mail (φ rating)
(???) (has a case)
(Communication) Search (hope)
The above (????) indicates that the case is not fixed because the surface case is "no case". Further, as the above analysis results show, there is nothing equivalent to a predicate in the case of the inverted type.
[0075]
Next, since no deep case is set for “mail”, the matching processing component 22 ignores the conditions of the deep case and the surface case (that is, relaxes the matching condition) and sets “mail” as an answer dictionary. A search is made from the “notation / attribute” column of the answer search table 63 created in FIG. As a result, answers 1 to 5 are candidates.
Next, since no deep case is set for “no case”, the matching processing component 22 ignores the conditions of the deep case and the surface case and replaces “no” with “notation / attribute” of the answer search table 63. Search from the column. As a result, there is no item corresponding to “no”.
[0076]
Next, the matching processing component 22 searches for the “notation / attribute” column of the answer search table 63 that “searches” and the “deep case” column that is a combination modification. As a result, there are no applicable items.
Next, the matching processing component 22 ignores the deep case condition of “search” and searches for the corresponding item in the “notation / attribute” column of the answer search table 63. As a result, answers 1 to 4 become answer candidates.
From the above search results, answers 1 to 5 are answer candidates.
[0077]
Next, the matching processing component 22 calculates the degree of matching for the case answer candidates.
[Degree of match for answer 1]
“Search” is 0.7 / 2 × 1/3 because only the notation matches and the dependency does not match at all.
“Mail” is 0.7 / 2 × 1/3 because only the notation matches and before and after the dependency does not match at all.
Therefore, the matching index is as follows:
(Matching index) = (0.7 / 2 × 1/3 + 0.7 / 2 × 1/3) /2=0.11.
The numerator of the matching index is 0.23.
[0078]
[Degree of match for answer 2]
“Search” is 0.7 / 2 × 1/3 because only the notation matches and the dependency does not match at all.
“Mail” is 0.7 / 2 × 1/3 because only the notation matches and before and after the dependency does not match at all.
Therefore, the matching index is as follows:
(Matching index) = (0.7 / 2 × 1/3 + 0.7 / 2 × 1/3) /3=0.08
The numerator of the matching index is 0.23.
[0079]
[Degree of agreement with answer 3]
“Search” is 0.7 / 2 × 1/3 because only the notation matches and the dependency does not match at all.
“Mail” is 0.7 / 2 × 1/3 because only the notation matches and before and after the dependency does not match at all.
Therefore, the matching index is as follows:
(Matching index) = (0.7 / 2 × 1/3 + 0.7 / 2 × 1/3) /4=0.06
The numerator of the matching index is 0.23.
[0080]
[Degree of match for answer 4]
“Search” is 0.7 / 2 × 1/3 because only the notation matches and the dependency does not match at all.
“Mail” is 0.7 / 2 × 1/3 because only the notation matches and before and after the dependency does not match at all.
Therefore, the matching index is as follows:
(Matching index) = (0.7 / 2 × 1/3 + 0.7 / 2 × 1/3) /3=0.23
The numerator of the matching index is 0.23.
[0081]
[Degree of agreement with answer 5]
“Mail” is 0.7 / 2 × 1/3 because only the notation matches and before and after the dependency does not match at all.
Therefore, the matching index is as follows:
(Matching index) = (0.7 / 2 × 1/3) /3=0.04
The numerator of the matching index is 0.12.
[0082]
From the above calculation results, the numerator of the matching index is the same from answer 1 to answer 4, but the answer 1 is the one with the highest matching index, so answer 1 is the answer. That is, answer 1 is set as the first answer candidate.
[0083]
FIG. 19 is a flowchart showing the operation of the natural sentence matching system 15.
First, the natural sentence matching system 15 acquires the natural sentence input from the user (step 1), and the morpheme component 17 decomposes the input sentence into morphemes (step 2).
The morpheme component 17 outputs the morpheme list to the phrase analysis component 18. The phrase analysis component 18 divides the input sentence into phrases using the morpheme list, and generates a phrase list (step 30).
[0084]
Next, the vocabulary processing component 19 acquires a phrase list from the phrase analysis component 18. The vocabulary processing component 19 adds information such as synonyms and synonyms registered in the vocabulary dictionary 25 to the phrase list and outputs the phrase list to the case frame processing component 20 (step 40).
Next, the case frame processing component 20 uses the case frame dictionary 26 to obtain a surface case (“wo case”, “ni case”, etc.) from the clause of the clause list in which information such as synonyms and synonyms is given to the clause. ) And deep case (such as “target case”, “partner case”), etc., are acquired, and sentence structure is determined (step 50).
[0085]
Next, the attribute assignment component 21 acquires the sentence structure from the case frame processing component 20, and the natural sentence matching system 15 is incorporated, for example, “line type (1)” with respect to “thin line” in the sentence. The application software and device-specific parameters are assigned to the sentence structure and output to the matching processing component (step 60).
[0086]
Next, the matching processing component 22 creates an answer dictionary 28 using the training corpus 29.
The case sentence of the training corpus 29 is analyzed by the morphological analysis component 17 or the like, and a sentence structure is created in the same manner as the input sentence. The matching processing component 22 creates an answer search table 63 in the answer dictionary 28 for comparing the sentence structure of the case sentence with the sentence structure of the input sentence.
Then, the matching processing component 22 calculates the matching degree between the sentence structure of the input sentence and the sentence structure of the case sentence using the answer search table 63, and outputs a command obtained from the case sentence having the highest degree of matching as an answer ( Step 80).
[0087]
The following effects can be obtained by the command processing device 1 according to the present embodiment described above.
Since sentences having the same meaning and semantically similar can be covered by one example sentence, it is not necessary to prepare sentences having various expressions having the same meaning when the training corpus 29 is created. For this reason, the cost for creating the training corpus can be reduced, and the capacity of the storage device can be saved.
If there are many semantically the same parts even if there are omitted words or completely different words in the input sentence, it is possible to estimate the omitted (or missing) words by matching, Misrecognition at the time of voice input by words and the like, and estimation of misanalyzed words can be made.
[0088]
By using phrase semantic information and dependency information, matching with higher accuracy than keyword matching can be performed.
Moreover, since the degree of coincidence is used for matching, it is possible to rank the answers. For this reason, when there are a large number of candidates, answers can be presented to the user in descending order of coincidence.
Furthermore, by adopting a method with good matching efficiency, even if the number of cases is large, the processing time can be shortened compared with the case where matching is performed one by one.
In addition, when application software is added, only a dictionary (attribute dictionary and answer dictionary) depending on the application software needs to be added, so that the application can be handled with a small dictionary size.
Further, by using an answer dictionary, matching can be performed efficiently even when there are many example sentences.
[0089]
【The invention's effect】
According to the present invention, it is possible to flexibly match a case sentence and an input natural sentence (input sentence) using semantic information and dependency information, and the number of case sentences is increased. Even in this case, it is possible to provide a natural sentence matching device, a natural sentence matching method, and a natural sentence matching program that can be efficiently matched.
[Brief description of the drawings]
FIG. 1 is a diagram showing an example of a hardware configuration of a natural sentence matching apparatus 1 according to the present embodiment.
FIG. 2 is a diagram showing a configuration of a natural sentence matching system.
FIG. 3 is a diagram showing an example of the configuration of a vocabulary dictionary.
FIG. 4 is a diagram showing an example of a synonym table.
FIG. 5 is a diagram showing an example of a synonym table.
FIG. 6 is a diagram illustrating an example of a polysemy table.
FIG. 7 is a diagram showing an example of a homonym table.
FIG. 8 is a diagram showing an example of a concept table.
FIG. 9 is a diagram showing an example of a time table.
FIG. 10 is a diagram for explaining multistage processing performed in vocabulary processing;
FIG. 11 is a diagram showing an example of a case frame information table.
FIG. 12 is a diagram showing an example of a sentence structure created by a case frame processing component.
FIG. 13 is a diagram showing another example of a sentence structure created by a case frame processing component.
FIG. 14 is a diagram illustrating an example of an attribute table.
FIG. 15 is a diagram showing an example of a training corpus.
FIG. 16 is a diagram showing a sentence structure of an answer of a training corpus.
FIG. 17 is a diagram showing an example of an answer search table.
FIG. 18 is a diagram showing commands and the like corresponding to answers.
FIG. 19 is a flowchart showing the operation of the natural sentence matching system.
[Explanation of symbols]
1 Natural sentence matching device
2 Central processing unit
3 Input / output section
4 RAM
5 ROM
6 Bus lines
7 Memory part
8 Program part
9 Database section
15 Natural sentence matching program
16 Input information
17 Morphological analysis component
18 Phrase analysis component
19 Vocabulary processing components
20 case frame component
21 Attribute assignment component
22 Matching processing component
23 Output information
25 Vocabulary dictionary
26 case frame dictionary
27 Attribute Dictionary
31 Synonyms
32 Synonyms
33 Ambiguous parts
34 Homophones
35 Concept Department
41 Synonym table
43 Synonym Table
45 Ambiguous Table
47 Homophone table
49 Concept Table
51 hour table
54 case frame information table
57 Attribute table
60 training corpus
63 Answer Search Table

Claims

A natural sentence acquisition means for acquiring the input natural sentence;
Natural sentence phrase classification means for classifying the natural sentence acquired by the natural sentence acquisition means into phrases and generating a phrase list;
Ranking means for ranking the clauses according to the degree of coincidence between the phrase list generated by the natural phrase segmentation means and the case frame in which the dependency information of the clause is stored in advance;
Natural sentence structure acquisition means for acquiring dependency information of the clauses classified by the natural sentence clause classification means in a surface case and a deep case, and acquiring the sentence structure of the natural sentence;
Case sentence structure acquisition means for acquiring sentence structure by acquiring clause dependency information in the surface case and deep case for the case sentence associated with the answer;
The degree of coincidence that obtains the degree of coincidence between the sentence structure acquired by the natural sentence structure acquisition unit and the sentence structure acquired by the case sentence structure acquisition unit, using the number of clauses of the phrase with the same dependency information Acquisition means;
An answer specifying means for specifying the answer using the degree of coincidence acquired by the degree of coincidence obtaining means;
A natural sentence matching device comprising:
The matching degree acquisition means, when counting the number of clauses matching clause regulates ranked clauses as the value of clause number lower the lower rank of the degree of coincidence Therefore clause lists and case frames in the ranking means Natural sentence matching device characterized by this.

The ranking means compares the phrase list obtained by the natural phrase segmentation means with the case frame related to the clause in both the surface case and the notation. Compare whether any of the notations match, and if either of the surface case or notation matches, the rank of the clause is made lower than when both the case and notation match The natural sentence matching apparatus according to claim 1.

The degree of coincidence acquisition means, when counting the number of clauses, compares the match between the clauses before and after the dependency of the natural sentence and the clauses before and after the dependency of the case sentence, If only one of the front and back matches, the number of clauses is lower than if both before and after the dependency match, and if both before and after the dependency do not match , The natural sentence matching apparatus according to claim 1, wherein the number of phrases is adjusted to be lower than that in a case where only the numbers match.

The degree of coincidence acquisition means includes the sentence structure acquired by the natural sentence structure acquisition means and the sentence structure of the case sentence acquired by the case sentence structure acquisition means, at least one of a surface case and a deep case. The natural sentence matching apparatus according to claim 1, wherein the matching degree is acquired by matching using the natural sentence matching apparatus.

Further comprising vocabulary information giving means for giving vocabulary information to the phrases included in the clauses classified by the natural phrase segmentation means;
The coincidence degree acquisition means acquires the coincidence degree by matching the sentence structure of the natural sentence and the sentence structure of the case sentence using the vocabulary information assigned to the phrase by the vocabulary information addition means. The natural sentence matching apparatus according to any one of claims 1 to 4, wherein the natural sentence matching apparatus is characterized.

6. The vocabulary information providing unit associates at least one of a synonym, a synonym, a polysemy, a homonym, and conceptual information corresponding to the phrase with a phrase included in the phrase. The natural sentence matching device described in 1.

The case sentence structure acquisition means includes:
A case sentence acquisition means for acquiring the case sentence;
Case sentence phrase classification means for classifying the case sentence acquired by the case sentence acquisition means into phrases;
The natural sentence matching device according to any one of claims 1 to 6, characterized by comprising:

The degree of coincidence acquisition means uses the case sentence structure acquired by the case sentence sentence structure acquisition means, and uses words or phrases included in the case sentence, using at least one of a surface case or a deep case of the word, the natural sentence The table creation means for creating a table for matching the sentence structure of the sentence and the sentence structure of the case sentence, further comprising: one of claims 1 to 7 The natural sentence matching apparatus according to claim.

Natural sentence acquisition means, morpheme analysis means, natural sentence clause classification means, ranking means, natural sentence structure acquisition means, case sentence sentence structure acquisition means, coincidence degree analysis means, answer identification means, In a computer with
A natural sentence storage step of storing the natural sentence input from the input device in the memory by the natural sentence acquisition means;
The morpheme analyzing means generates a morpheme string by performing morpheme analysis on the natural sentence stored in the natural sentence storage step, and stores the generated morpheme string in a memory;
A phrase analysis step of generating a phrase list by combining the independent words and the adjunct words of the morpheme sequence stored in the morpheme analysis step by the natural phrase segmentation means; and storing the generated phrase list in a memory;
The ranking means analyzes the degree of coincidence between the phrase list stored in the phrase analysis step and the case frame stored in the storage device, ranks the phrases, and stores the rank in the memory. A ranking step;
Analyzing the sentence structure of the natural sentence by specifying the dependency information of the phrase in the phrase list stored in the phrase analysis step by the natural sentence structure obtaining means by specifying the surface case and the deep case, A natural sentence structure analysis step for storing the sentence structure obtained as a result of the analysis in a memory;
The sentence structure obtained by the case sentence structure acquisition means classifies the case sentence associated with the answer into clauses, analyzes the dependency information of the classified clauses in a surface case and a deep case, and obtains the result of the analysis Example sentence structure analysis step for storing the structure in memory;
The degree of coincidence between the sentence structure stored in the memory in the natural sentence structure analysis step and the sentence structure stored in the memory in the case sentence structure analysis step is calculated by the coincidence degree analysis unit. Analyzing using the number of clauses and storing the analyzed matching score in memory,
An answer specifying step for specifying the answer by using the matching degree stored in the memory in the matching degree analyzing step by the answer specifying means;
A natural sentence matching method for performing
When counting the number of clauses that match in the matching level analysis step, the number of clauses in the phrase ranked in the rank with the lower matching level between the clause list and the case frame by the rank stored in the memory in the ranking step The natural sentence matching method characterized by adjusting low.

By being loaded into the computer's memory and executed by the CPU,
A natural sentence storage function for storing a natural sentence input from an input device in a memory;
A morpheme analysis function for generating a morpheme string by morphological analysis of the natural sentence stored by the natural sentence storage function, and storing the generated morpheme string in a memory;
A phrase analysis function for generating a phrase list by combining the independent words and the attached words of the morpheme sequence stored by the morpheme analysis function, and storing the generated phrase list in a memory;
A ranking function for analyzing the degree of coincidence between the phrase list stored by the phrase analysis function and a case frame stored in a storage device, ranking the phrases, and storing the rank in a memory;
Analyzing the sentence structure of the natural sentence by specifying the dependency information of the phrase in the phrase list stored by the phrase analysis function by the surface case and the deep case, and the sentence structure obtained as a result of the analysis is analyzed. Natural sentence structure analysis function stored in memory,
A case sentence that divides the case sentence associated with the answer into clauses, analyzes the dependency information of the divided clauses in a surface case and a deep case, and stores the sentence structure obtained as a result of the analysis in a memory Structural analysis function,
The degree of coincidence between the sentence structure stored in the memory by the natural sentence structure analysis function and the sentence structure stored in the memory by the case sentence structure analysis function is analyzed using the number of clauses of the phrase having the same dependency information. , A matching degree analysis function for storing the analyzed matching degree in a memory,
An answer identifying function for identifying the answer using the degree of coincidence stored in the memory by the degree of coincidence analysis function;
Is a natural sentence matching program that realizes
When counting the number of clauses of matching clauses by the matching degree analysis function, the number of clauses is adjusted to be lower for the clauses ranked in the rank having a lower degree of matching between the clause list and the case frame according to the rank stored in the memory. Natural sentence matching program characterized by that.