JP2015087796A

JP2015087796A - Questioning field determination device and questioning field determination method

Info

Publication number: JP2015087796A
Application number: JP2013223247A
Authority: JP
Inventors: 渉内田; Wataru Uchida; 孝輔辻野; Kosuke Tsujino; 吉村　健; Takeshi Yoshimura; 健吉村
Original assignee: NTT Docomo Inc
Current assignee: NTT Docomo Inc
Priority date: 2013-10-28
Filing date: 2013-10-28
Publication date: 2015-05-07
Anticipated expiration: 2033-10-28
Also published as: JP6178208B2

Abstract

PROBLEM TO BE SOLVED: To determine accurately whether a question sentence is a question sentence in a specific field or not.SOLUTION: A question answering device 10 includes a question sentence input part 11 for inputting a question sentence, an entity extraction part 12 for extracting an entity which is a question object from an inputted question sentence, a perplexity calculation part 13 for calculating the value of perplexity based on a language model in which a question sentence in a specific field is assumed from the inputted question sentence, and a determination part 14 for determining whether the inputted question sentence is a question sentence in the specific field or not based on the extraction result of the entity and on the calculated perplexity value.

Description

本発明は、質問文の分野を判定する質問分野判定装置及び質問分野判定方法に関する。 The present invention relates to a question field determination device and a question field determination method for determining a field of a question sentence.

音声認識技術や質問応答技術の発展により、ユーザの自然文形式の質問に対して受け答えをするエージェント型のサービスが普及しつつある。エージェント型のサービスは天気や乗り換え案内、雑談、歴史や社会等の客観的な事実といった様々な質問に対して回答することができる。質問を「今日の天気を教えて」といった自然な文章で実施でき、まるで自分専用の秘書がパソコンや携帯端末に存在しているように利用できる点が特徴となっている。 With the development of speech recognition technology and question answering technology, agent-type services that accept and answer questions in the natural sentence format of users are becoming widespread. The agent-type service can answer various questions such as weather, transfer guidance, chat, objective facts such as history and society. Questions can be conducted with natural sentences such as "tell me today's weather", and it is characterized by the fact that a personal secretary can be used as if it exists on a personal computer or mobile terminal.

特許文献１には、典型的な質問応答のためのシステム構成が開示されている。ユーザからの質問内容を解析し質問のタイプをＷｈｏ、Ｗｈｅｒｅ、Ｗｈｅｎといったいわゆる５Ｗ１Ｈの枠組みに沿って分類し、それぞれ異なる方式で回答抽出を行っている。また、特許文献２には、雑談対話を行うための典型的な枠組みが開示されている。 Patent Document 1 discloses a system configuration for typical question answering. The contents of the question from the user are analyzed, and the question type is classified according to the so-called 5W1H framework such as Who, Where, and When, and answers are extracted by different methods. Patent Document 2 discloses a typical framework for performing a chat conversation.

特開２００２−１３２８１１号公報Japanese Patent Laid-Open No. 2002-132911 特許第５１５０５８３号公報Japanese Patent No. 5150583

天気や乗り換え案内、雑談等ユーザがする質問の分野によって、回答を生成するための方法は大きく異なるため、エージェント型のサービスではこういった質問に回答するためには実際に回答を生成する処理の前に、その質問の対象分野を判定する処理を実施する必要がある（質問分野判定）。特にユーザが誰かに話しかけている感覚を得るために画面上にエージェントを視覚化したキャラクターを表示する場合があり、そのキャラクターに対する質問文（以下、雑談と呼ぶ。例えば「あなたの年齢は？」）と客観的な事実（例えば「総理大臣の年齢は？」）とは、文章の構成は非常に類似しているが求められる動作は大きく異なるため、それを選り分けるための方法が重要となる。 The method for generating answers varies greatly depending on the field of questions the user asks, such as weather, transfer guidance, chat, etc. In the agent type service, in order to answer such questions, the process of actually generating the answer Before that, it is necessary to carry out a process of determining the target field of the question (question field determination). In particular, there is a case where a character visualizing an agent is displayed on the screen in order to obtain a feeling that the user is talking to someone, and a question sentence for the character (hereinafter referred to as chatting. For example, “What is your age?”) The objective structure (for example, “What is the age of the prime minister?”) Is very similar in composition, but the required actions are very different, so a method for selecting them is important.

上記の特許文献１に示される方法では、文末の表現が「だれ」ならＷｈｏといったように文章の単純な素性に従って質問のタイプを分類している。雑談と客観的な事実に関する質問とは特に文末表現等の単純な素性では区別が難しい。そのため、特許文献１に示される方法では適切に質問分野を判定することはできず、それに特化した判定方法が必要である。また、エージェント型のサービスで行われる入力には雑談と客観的な事実に関する質問とが混在するため、特許文献２に示されるような雑談対話が主体のシステムにもそれらを峻別する方法が求められる。 In the method disclosed in Patent Document 1, the question type is classified according to a simple feature of the sentence such as Who if the expression at the end of the sentence is “who”. Chatting and questions about objective facts are difficult to distinguish, especially with simple features such as sentence endings. For this reason, the method disclosed in Patent Document 1 cannot appropriately determine the question field, and a determination method specialized for it is necessary. In addition, since chats and questions about objective facts are mixed in the input performed by the agent-type service, a method of distinguishing them is required even in a system mainly having chat conversation as shown in Patent Document 2. .

本発明は、上記の問題点に鑑みてなされたものであり、例えば、ユーザの質問に対して雑談及び客観的な事実に対する応答を含む受け答えを可能とするように、質問文が特定の分野の質問文であるか否かを正確に判定する質問分野判定装置及び質問分野判定方法を提供することを目的とする。 The present invention has been made in view of the above-mentioned problems. For example, a question sentence is in a specific field so that a user's question can be answered and received including a chat and a response to an objective fact. It is an object of the present invention to provide a question field determination device and a question field determination method for accurately determining whether or not a question sentence.

上記の目的を達成するために、本発明に係る質問分野判定装置は、質問文を入力する質問文入力手段と、質問文入力手段によって入力された質問文から、質問対象であるエンティティを抽出するエンティティ抽出手段と、質問文入力手段によって入力された質問文から、特定の分野の質問文を想定した言語モデルによるパープレキシティの値を算出するパープレキシティ算出手段と、エンティティ抽出手段によるエンティティの抽出結果、及びパープレキシティ算出手段によって算出されたパープレキシティの値に基づいて、質問文入力手段によって入力された質問文が特定の分野の質問文であるか否かを判定する判定手段と、を備える。 To achieve the above object, a question field determination apparatus according to the present invention extracts a question sentence input means for inputting a question sentence, and an entity that is a question target from the question sentence input by the question sentence input means. An entity extraction means; a perplexity calculation means for calculating a perplexity value based on a language model assuming a question sentence in a specific field from a question sentence input by the question sentence input means; and Determination means for determining whether the question text input by the question text input means is a question text in a specific field based on the extraction result and the perplexity value calculated by the perplexity calculation means; .

本発明に係る質問分野判定装置では、質問文からのエンティティの抽出結果、及び質問文から算出されたパープレキシティの値の両者が考慮された上で、質問文が特定の分野の質問文であるか否かが判定される。これにより、本発明に係る質問分野判定装置によれば、質問文が特定の分野の質問文であるか否かを正確に判定することが可能になる。 In the question field determination device according to the present invention, after considering both the entity extraction result from the question sentence and the perplexity value calculated from the question sentence, the question sentence is a question sentence in a specific field. It is determined whether or not there is. Thereby, according to the question field judgment device concerning the present invention, it becomes possible to judge correctly whether a question text is a question text of a specific field.

パープレキシティ算出手段は、質問文に含まれる単語の前後関係に基づいて、パープレキシティの値を算出することとしてもよい。この構成によれば、適切かつ確実にパープレキシティの値を算出することができ、適切かつ確実に本発明を実施することができる。 The perplexity calculating means may calculate a perplexity value based on the context of words included in the question sentence. According to this configuration, the perplexity value can be calculated appropriately and reliably, and the present invention can be implemented appropriately and reliably.

判定手段は、エンティティの抽出結果に応じた閾値を設定し、設定した閾値とパープレキシティの値とを比較して、質問文が特定の分野の質問文であるか否かを判定することとしてもよい。この構成によれば、適切かつ確実に判定を行うことができ、適切かつ確実に本発明を実施することができる。 The determination means sets a threshold value according to the extraction result of the entity, compares the set threshold value with the perplexity value, and determines whether or not the question sentence is a question sentence in a specific field. Also good. According to this configuration, the determination can be performed appropriately and reliably, and the present invention can be implemented appropriately and reliably.

質問分野判定装置は、判定手段によって、質問文が特定の分野の質問文であると判定された場合には、第１の方法によって当該質問文に対する回答を生成して出力する第１回答手段と、判定手段によって、質問文が特定の分野以外の分野の質問文であると判定された場合には、第１の方法とは異なる第２の方法によって当該質問文に対する回答を生成して出力する第２回答手段と、を更に備えることとしてもよい。この構成によれば、質問文に対する回答を出力することができる。 The question field determination device includes a first answering unit that generates and outputs an answer to the question message by the first method when the determination unit determines that the question message is a question message in a specific field. When the determination means determines that the question sentence is a question sentence in a field other than the specific field, an answer to the question sentence is generated and output by a second method different from the first method. It is good also as providing a 2nd reply means. According to this configuration, an answer to the question sentence can be output.

判定手段は、エンティティの抽出結果、及びパープレキシティの値に基づいて、質問文が特定の分野の質問文である度合いを算出し、質問分野判定装置は、判定手段によって算出された度合いに応じて、当該質問文に対する回答を行うための、エンティティ抽出手段によって抽出されたエンティティに関する情報追加を要求する情報追加要求手段を更に備えることとしてもよい。この構成によれば、質問を把握するために必要な情報を取得することができる。 The determination means calculates the degree to which the question sentence is a question sentence in a specific field based on the entity extraction result and the perplexity value, and the question field determination device determines whether the question sentence is determined by the determination means. In addition, an information addition requesting unit that requests addition of information related to the entity extracted by the entity extracting unit for answering the question sentence may be further provided. According to this configuration, information necessary for grasping the question can be acquired.

ところで、本発明は、上記のように質問分野判定装置の発明として記述できる他に、以下のように質問分野判定方法の発明としても記述することができる。これはカテゴリが異なるだけで、実質的に同一の発明であり、同様の作用及び効果を奏する。 By the way, the present invention can be described as an invention of a question field determination method as described below, in addition to the invention of a question field determination device as described above. This is substantially the same invention only in different categories, and has the same operations and effects.

即ち、本発明に係る質問分野判定方法は、質問文を入力する質問文入力ステップと、質問文入力ステップにおいて入力された質問文から、質問対象であるエンティティを抽出するエンティティ抽出ステップと、質問文入力ステップにおいて入力された質問文から、特定の分野の質問文を想定した言語モデルによるパープレキシティの値を算出するパープレキシティ算出ステップと、エンティティ抽出ステップにおけるエンティティの抽出結果、及びパープレキシティ算出ステップにおいて算出されたパープレキシティの値に基づいて、質問文入力ステップにおいて入力された質問文が特定の分野の質問文であるか否かを判定する判定ステップと、を含む。 That is, the question field determination method according to the present invention includes a question sentence input step for inputting a question sentence, an entity extraction step for extracting an entity to be questioned from the question sentence input in the question sentence input step, and a question sentence A perplexity calculation step for calculating a perplexity value based on a language model assuming a question text in a specific field from the question text input in the input step, an entity extraction result in the entity extraction step, and a perplexity A determination step of determining whether or not the question text input in the question text input step is a question text in a specific field based on the perplexity value calculated in the calculation step.

本発明では、質問文からのエンティティの抽出結果、及び質問文から算出されたパープレキシティの値の両者が考慮された上で、質問文が特定の分野の質問文であるか否かが判定される。これにより、本発明によれば、質問文が特定の分野の質問文であるか否かを正確に判定することが可能になる。 In the present invention, it is determined whether or not the question sentence is a question sentence in a specific field after taking into account both the entity extraction result from the question sentence and the perplexity value calculated from the question sentence. Is done. Thereby, according to this invention, it becomes possible to determine correctly whether a question sentence is a question sentence of a specific field.

本発明の第１実施形態に係る質問分野判定装置である質問応答装置の構成、及び当該質問応答装置を含んで構成される質問応答システムを示す図である。BRIEF DESCRIPTION OF THE DRAWINGS It is a figure which shows the question response system comprised including the structure of the question response apparatus which is a question field determination apparatus which concerns on 1st Embodiment of this invention, and the said question response apparatus. 客観的事実以外に関する質問文に対する回答を生成するための情報を示すテーブルである。It is a table which shows the information for producing | generating the reply with respect to the question text regarding things other than objective fact. 本発明の実施形態に係る質問分野判定装置である質問応答装置のハードウェア構成を示す図である。It is a figure which shows the hardware constitutions of the question response apparatus which is a question field determination apparatus which concerns on embodiment of this invention. 本発明の第１実施形態に係る質問分野判定装置である質問応答装置で実行される処理（質問分野判定方法）を示すフローチャートである。It is a flowchart which shows the process (question field determination method) performed with the question response apparatus which is a question field determination apparatus which concerns on 1st Embodiment of this invention. 本発明の第２実施形態に係る質問分野判定装置である質問応答装置の構成、及び当該質問応答装置を含んで構成される質問応答システムを示す図である。It is a figure which shows the question response system comprised including the structure of the question response apparatus which is a question field | area determination apparatus which concerns on 2nd Embodiment of this invention, and the said question response apparatus. エンティティ−プロパティ型の知識データを示す表である。It is a table | surface which shows knowledge data of an entity property type. ユーザの過去の質問履歴を示す表である。It is a table | surface which shows a user's past question history.

以下、図面と共に本発明に係る質問分野判定装置及び質問分野判定方法の実施形態について詳細に説明する。なお、図面の説明においては同一要素には同一符号を付し、重複する説明を省略する。 Hereinafter, embodiments of a question field determination device and a question field determination method according to the present invention will be described in detail with reference to the drawings. In the description of the drawings, the same elements are denoted by the same reference numerals, and redundant description is omitted.

図１に本発明の第１実施形態に係る質問分野判定装置である質問応答装置１０、及び当該質問応答装置１０を含んで構成される質問応答システム１を示す。質問応答システム１は、質問応答装置１０と、ユーザ端末２０とを含んで構成されている。質問応答システム１は、ユーザ端末２０から質問応答装置１０に対して質問文が送信され、当該質問文への応答が質問応答装置１０において生成されてユーザ端末２０に送信されるシステムである。 FIG. 1 shows a question answering device 10 which is a question field determination device according to the first embodiment of the present invention, and a question answering system 1 including the question answering device 10. The question answering system 1 includes a question answering device 10 and a user terminal 20. The question answering system 1 is a system in which a question sentence is transmitted from the user terminal 20 to the question answering apparatus 10, and a response to the question sentence is generated in the question answering apparatus 10 and transmitted to the user terminal 20.

ユーザ端末２０は、ユーザから質問文を入力するためのキーボードや音声入力機能等の入力手段と、モニター等の質問文に対する回答を出力する出力手段を備えた一般的な端末（装置）である。また、ユーザ端末２０は、通信ネットワークを介して質問応答装置１０との間で通信を行うことができる。通信ネットワークは、例えば、インターネットや移動体通信網等の公衆通信網、専用線である。また、本実施形態における装置間の接続は、上記の通信ネットワークに限られず、同一物理装置内の通信バス、あるいはそれらを組み合わせものであってもよい。具体的には、ユーザ端末２０は、ユーザによって用いられるパーソナルコンピュータや携帯電話機等に相当する。 The user terminal 20 is a general terminal (apparatus) including input means such as a keyboard and a voice input function for inputting a question sentence from a user, and output means for outputting an answer to the question sentence such as a monitor. Moreover, the user terminal 20 can communicate with the question answering apparatus 10 via a communication network. The communication network is, for example, a public communication network such as the Internet or a mobile communication network, or a dedicated line. Further, the connection between devices in the present embodiment is not limited to the communication network described above, and may be a communication bus in the same physical device, or a combination thereof. Specifically, the user terminal 20 corresponds to a personal computer or a mobile phone used by the user.

ユーザ端末２０は、上記の入力手段が用いられたユーザの操作等によって質問文を入力する。質問文は、例えば、ユーザによってキーボードが用いられたものでもよいし、ユーザの音声を入力して音声認識によって得られたものでもよい。ユーザ端末２０は、入力した質問文をテキスト文章（プレーンテキスト）の形式で質問応答装置１０に送信する。質問文の例としては、「あなたのお名前は」や「アメリカ大統領の名前は」「人類初の宇宙飛行士は誰」といったものが挙げられる。 The user terminal 20 inputs a question sentence by a user operation or the like using the above input means. The question sentence may be, for example, a keyboard used by the user or may be obtained by voice recognition by inputting the user's voice. The user terminal 20 transmits the input question sentence to the question answering apparatus 10 in the form of a text sentence (plain text). Examples of questions include "Your name", "The US President's name", "Who is the first astronaut?"

ここで、質問文には複数の分野がある。例えば分野としては、客観的事実に関する質問文と、客観的事実以外に関する質問文とがある。客観的事実に関する質問文とは、案内、雑談、歴史や社会等の、本実施形態に係る質問応答装置１０とは関係がない客観的な事実についての質問文である。上記の質問文の例の「アメリカ大統領の名前は」「人類初の宇宙飛行士は誰」という質問文は、客観的事実に関する質問文に相当する。 Here, the question text has a plurality of fields. For example, the field includes a question sentence concerning objective facts and a question sentence concerning non-objective facts. The question sentence regarding objective facts is a question sentence regarding objective facts that are not related to the question answering apparatus 10 according to the present embodiment, such as guidance, chat, history, and society. In the example of the above question sentence, the question sentences “The name of the President of the United States” and “Who is the first astronaut?” Correspond to the question sentence concerning objective facts.

客観的事実以外に関する質問文とは、本実施形態に係る質問応答装置１０の存在を前提とした質問である。例えば、質問応答装置１０との質問応答を行うためにユーザ端末２０上に表示されたキャラクター（エージェントを視覚化したキャラクター）に対する質問である。上記の質問文の例の「あなたのお名前は」という質問文は、客観的事実以外に関する質問文に相当する。 The question sentence related to other than the objective fact is a question based on the existence of the question answering apparatus 10 according to the present embodiment. For example, it is a question with respect to a character (a character visualizing an agent) displayed on the user terminal 20 in order to perform a question response with the question answering device 10. The question sentence “Your name is” in the above example of the question sentence corresponds to a question sentence related to things other than objective facts.

ユーザ端末２０は、入力した質問文の送信に応じて質問応答装置１０から送信される当該質問文に対する回答を受信する。ユーザ端末２０は、受信した回答の表示出力等の出力を行う。このように、質問応答システム１では、ユーザの質問に対して雑談及び客観的な事実に対する応答を含む受け答えが可能である。 The user terminal 20 receives an answer to the question sentence transmitted from the question answering device 10 in response to the transmission of the input question sentence. The user terminal 20 performs output such as display output of the received answer. Thus, in the question answering system 1, it is possible to receive and answer questions including responses to chats and objective facts to the user's question.

引き続いて、本実施形態に係る質問応答装置１０の機能について説明する。図１に示すように質問応答装置１０は、質問文入力部１１と、エンティティ抽出部１２と、パープレキシティ算出部１３と、判定部１４と、客観的事実応答部１５と、その他質問応答部１６とを備えて構成される。 Subsequently, functions of the question answering apparatus 10 according to the present embodiment will be described. As shown in FIG. 1, the question answering apparatus 10 includes a question sentence input unit 11, an entity extraction unit 12, a perplexity calculation unit 13, a determination unit 14, an objective fact response unit 15, and other question response units. 16.

質問文入力部１１は、質問文を入力する質問文入力手段である。具体的には、質問文入力部１１は、ユーザ端末２０から質問文であるテキスト文章（プレーンテキスト）のデータを受信することで質問文を入力する。質問文入力部１１は、入力した質問文をエンティティ抽出部１２、パープレキシティ算出部１３及びパープレキシティ算出部１３に出力する。 The question sentence input unit 11 is a question sentence input means for inputting a question sentence. Specifically, the question sentence input unit 11 inputs a question sentence by receiving data of a text sentence (plain text) that is a question sentence from the user terminal 20. The question sentence input unit 11 outputs the input question sentence to the entity extraction unit 12, the perplexity calculation unit 13, and the perplexity calculation unit 13.

エンティティ抽出部１２は、質問文入力部１１から入力された質問文から、質問対象であるエンティティ（Ｅｎｔｉｔｙ）を抽出するエンティティ抽出手段である。質問対象であるエンティティとは、質問文が何に対して質問するものであるか示すものであり、質問文の主語となる部分である。例えば、「アメリカ大統領の名前は」という質問文では「アメリカ大統領」が、「人類初の宇宙飛行士は誰」では「人類初の宇宙飛行士」が、それぞれエンティティに相当する。具体的には例えば、エンティティ抽出部１２は、以下のようにエンティティを抽出する。 The entity extraction unit 12 is an entity extraction unit that extracts an entity (Entity) as a question target from the question sentence input from the question sentence input unit 11. The entity that is the question target indicates what the question sentence asks, and is the subject of the question sentence. For example, in the question sentence “What is the name of the US President?”, “US President” corresponds to an entity, and in “Who is the first astronaut?” “First astronaut” corresponds to an entity. Specifically, for example, the entity extraction unit 12 extracts an entity as follows.

エンティティの抽出には、機械学習の一手法である系列ラベリングの手法を応用する。系列ラベリングは、単語系列等の系列データに対し、予め与えられたラベルを付与する。系列ラベリングを本実施形態におけるエンティティ抽出に応用する際の動作例を説明する。 For the entity extraction, a sequence labeling method, which is a method of machine learning, is applied. In the sequence labeling, a predetermined label is assigned to sequence data such as a word sequence. An operation example when applying sequence labeling to entity extraction in this embodiment will be described.

エンティティ抽出部１２は、入力した質問文であるテキストを、意味をもつ最小の単位である形態素に分割する。形態素への分割は、例えば、従来から用いられている形態素解析器を用いることができる。質問文が「アメリカ大統領の名前は」というものであれば、形態素解析の結果は「アメリカ／大統領／の／名前／は」となる。ここで「／」は形態素の区切りを表している。 The entity extraction unit 12 divides the text that is the input question sentence into morphemes that are the smallest meaningful units. For the division into morphemes, for example, a morphological analyzer that has been conventionally used can be used. If the question sentence is “What is the name of the US President?”, The result of the morphological analysis will be “US / President / NO / NAME / HA”. Here, “/” represents a morpheme break.

続いて、エンティティ抽出部１２は、ＢＩＯ形式と呼ばれる三種類のラベルを各形態素に付与する。それぞれ、Ｂは検出したい区間（ここではエンティティ）の頭（当該区間の最初のエンティティ）、Ｉは当該区間の途中、Ｏは当該区間以外の部分を指す。上記の例では、ラベリング結果は例えば、「アメリカ（Ｂ）／大統領（Ｉ）／の（Ｏ）／名前（Ｏ）／は（Ｏ）」となる。各形態素の後に付けられた（Ｂ）（Ｉ）（Ｏ）は、ぞれぞれのラベルが付与されたことを表す。最後にＢラベルを付与した形態素から続くＩラベルを付与した形態素を全て取り出して、最終結果であるエンティティとする。上記の例では、「アメリカ（Ｂ）／大統領（Ｉ）」をエンティティとする。 Subsequently, the entity extraction unit 12 assigns three types of labels called BIO format to each morpheme. In the figure, B is the head of the section (entity here) to be detected (the first entity in the section), I is in the middle of the section, and O is a portion other than the section. In the above example, the labeling result is, for example, “America (B) / President (I) / (O) / Name (O) / Has (O)”. (B) (I) (O) attached after each morpheme represents that each label is given. Finally, all the morphemes to which the I label following the morpheme to which the B label is assigned are taken out and set as the entity as the final result. In the above example, “USA (B) / President (I)” is the entity.

上記の例では、５つの形態素に対して３種類のラベルを付与するため、３の５乗通りのラベル付けの可能性がある。系列ラベリングの手法では、一般にラベル対象の系列（例えば、形態素の系列）が与えられた場合のラベル系列の条件付き確率を、機械学習の手法を用いて計算する。予め正解のラベルを付与した文字列を用いて、ラベル対象の系列とラベルとの関係を学習しておく。実際のラベリングを行う段階でそれらの関係を用いて各ラベル付与の可能性（上記の例では３の５乗通り）に対する確率を計算し、もっともらしい系列を出力する。 In the above example, since three types of labels are given to five morphemes, there is a possibility of labeling in the fifth power of 3. In the sequence labeling method, generally, a conditional probability of a label sequence when a sequence to be labeled (for example, a morpheme sequence) is given is calculated using a machine learning method. The relationship between the label target series and the label is learned using a character string to which a correct label is assigned in advance. At the stage of actual labeling, the probability for each labeling possibility (in the above example, 3 to the fifth power) is calculated using these relationships, and a plausible sequence is output.

系列ラベリングについては従来の方法を用いることができる。例えば、ＣＲＦ＋＋等の市販されている、又は無償で公開されている多数のツールの何れかを用いて上記の動作を実装することができる。 A conventional method can be used for sequence labeling. For example, the above operations can be implemented using any one of a number of commercially available tools such as CRF ++, which are available for free.

また、エンティティの抽出には必ずしも機械学習の手法を用いる必要はない。例えば文頭に「ＸＸの」といった物事の主語を表す表現が出現した場合に「ＸＸ」の部分をエンティティとみなす単純な方法や構文解析に基づき主格に当たる部分を抽出する方法等によってエンティティを抽出してもよい。 Further, it is not always necessary to use a machine learning method for extracting an entity. For example, when an expression representing the subject of a thing such as “XX” appears at the beginning of the sentence, the entity is extracted by a simple method that regards the part of “XX” as an entity or a method that extracts a part corresponding to the main character based on syntax analysis. Also good.

エンティティ抽出部１２は、抽出したエンティティを判定部１４に出力する。また、質問文がまともな質問の形をしてない、あるいは質問文にエンティティが含まれないこと等により、エンティティの抽出が行えない場合がある。その場合、エンティティ抽出部１２は、エンティティの抽出が行えなかった旨を判定部１４に通知する。 The entity extraction unit 12 outputs the extracted entity to the determination unit 14. Further, there are cases where the entity cannot be extracted because the question sentence does not form a decent question or the entity is not included in the question sentence. In that case, the entity extraction unit 12 notifies the determination unit 14 that the entity could not be extracted.

パープレキシティ算出部１３は、質問文入力部１１から入力された質問文から、特定の分野の質問文を想定した言語モデルによるパープレキシティ（ｐｅｒｐｌｅｘｉｔｙ）の値（ｐｐｌ）を算出するパープレキシティ算出手段である。特定の分野の質問文を想定した言語モデルによるパープレキシティの値とは、入力した質問文が上記の特定の種別の質問文であると仮定した場合の適合度合いを示す値である。本実施形態では、特定の分野の質問文は、客観的事実に関する質問文である。言語モデルとは、例えば、ある言語を想定した場合の単語間及び品詞間の関係をモデル化したものである。本実施形態では、以下のように質問文に含まれる単語の前後関係に基づく言語モデルを用いて、パープレキシティ（客観的事実パープレキシティ）の値を算出する。 The perplexity calculation unit 13 calculates a perplexity value (ppl) based on a language model assuming a question text in a specific field from the question text input from the question text input unit 11. It is a calculation means. The perplexity value based on the language model assuming a question text in a specific field is a value indicating the degree of conformity when it is assumed that the input question text is the above-mentioned specific type of question text. In the present embodiment, the question text in a specific field is a question text regarding objective facts. The language model is, for example, a model of a relationship between words and parts of speech when a certain language is assumed. In the present embodiment, the perplexity (objective fact perplexity) value is calculated using a language model based on the context of words included in the question sentence as follows.

言語モデルには、様々な表現方法があるが、本実施形態ではＮグラムモデルを利用する。例えば、Ｎ＝２の場合は２単語の連なりに関する確率（Ｐ（ｗ_ｉ｜ｗ_ｉ−１）、ｗ_ｉは文章中のｉ番目の単語）を表現する。これにより、単語の出現が一つ前の単語のみに依存するという仮定のもとで文章の出現確率Ｐ（ｗ_１，…，ｗ_ｎ）（ｎは文章中の単語の数）を、条件付き確率をかけあわせたΠＰ（ｗ_ｉ｜ｗ_ｉ−１）と計算することができる。 There are various representation methods for the language model, but in this embodiment, an N-gram model is used. For example, in the case of N = 2, the probability (P (w _i | w _i−1 ), w _i is the i-th word in the sentence) related to a sequence of two words is expressed. As a result, the sentence appearance probability P (w ₁ ,..., W _n ) (n is the number of words in the sentence) is conditional on the assumption that the word appearance depends only on the previous word. It can be calculated as ΠP (w _i | w _i−1 ) multiplied by the probability.

言語モデルは、想定した応用分野の文章例を学習することで作成される。言語モデルの生成については従来の方法を用いることができる。例えば、ＳＲＩＬＭ等の一般的なツールが存在するため、これらのツールに文章例を入力することで言語モデルを作成可能である。本実施形態においては、雑談等の他の意図の質問を取り除いた、客観的事実に関する質問文のみの文例より作成された言語モデルを用いる。 A language model is created by learning examples of sentences in an assumed application field. A conventional method can be used for generating the language model. For example, since there are general tools such as SRILM, a language model can be created by inputting an example sentence into these tools. In the present embodiment, a language model created from a sentence example of only a question sentence relating to an objective fact from which a question of other intentions such as chat is removed is used.

パープレキシティ算出部１３は、上記の言語モデルに係る情報を予め記憶しておき、当該情報に基づいてパープレキシティの値を算出する。パープレキシティ算出部１３は、入力した質問文であるテキストを、意味をもつ最小の単位である形態素に分割する。形態素への分割は、エンティティ抽出部１２による形態素への分割と同様に行う。なお、パープレキシティ算出部１３は、エンティティ抽出部１２による形態素への分割の結果を利用することとしてもよい。あるいは、エンティティ抽出部１２及びパープレキシティ算出部１３による処理の前に、例えば、質問文入力部１１が形態素への分割を行って、分割の結果をエンティティ抽出部１２及びパープレキシティ算出部１３に入力してよい。 The perplexity calculation unit 13 stores information related to the language model in advance, and calculates a perplexity value based on the information. The perplexity calculation unit 13 divides the text that is the input question sentence into morphemes that are the smallest meaningful units. The division into morphemes is performed in the same manner as the division into morphemes by the entity extraction unit 12. The perplexity calculation unit 13 may use the result of division into morphemes by the entity extraction unit 12. Alternatively, before the processing by the entity extraction unit 12 and the perplexity calculation unit 13, for example, the question sentence input unit 11 performs division into morphemes, and the result of the division is obtained by the entity extraction unit 12 and the perplexity calculation unit 13. You may enter

パープレキシティ算出部１３は、分割後の形態素である各単語の質問文中の位置関係に基づき、以下の式に基づきエントロピーＨを算出する。

各単語位置において、Ｐ（ｗ_ｉ｜ｗ_ｉ−１，ｗ_ｉ−２，…，ｗ_１）はそれ以前までの単語の系列から見た場合の、その単語の出現確率（例えば、「アメリカ／大統領／の」と出現した場合の「名前」が出現する確率）となる。言語モデルを利用することで、この値を計算することができる。この（対数の）平均値をとるため、おおよそその文章が、想定する言語モデルを対象とした場合にどの程度珍しいものであるかを評価することができる。 The perplexity calculating unit 13 calculates the entropy H based on the following expression based on the positional relationship in the question sentence of each word that is a morpheme after the division.

At each word position, P (w _i | w _i−1 , w _i−2 ,..., W ₁ ) is the probability of appearance of the word (for example, “America / The probability of “name” appearing when “president / no” appears. This value can be calculated by using a language model. Since this (logarithmic) average value is taken, it is possible to evaluate how rare the sentence is when the intended language model is targeted.

パープレキシティ算出部１３は、２のエントロピー乗（２^Ｈ）をパープレキシティの値として算出する。上記のように算出したパープレキシティの値は、値が低い程、入力した質問文が客観的事実に関する質問文であると仮定した場合の適合度合いが高いことを示している。パープレキシティ算出部１３は、算出したパープレキシティの値を判定部１４に出力する。 The perplexity calculation unit 13 calculates 2 entropy power (2 ^H ) as a perplexity value. The perplexity value calculated as described above indicates that the lower the value, the higher the degree of conformity when it is assumed that the input question text is a question text regarding objective facts. The perplexity calculation unit 13 outputs the calculated perplexity value to the determination unit 14.

判定部１４は、エンティティ抽出部１２によるエンティティの抽出結果、及びパープレキシティ算出部１３によって算出されたパープレキシティの値に基づいて、質問文入力部１１によって入力された質問文が客観的事実に関する質問文であるか否かを判定する判定手段である。判定部１４は、エンティティ抽出部１２によるエンティティの抽出結果（エンティティが抽出されたか否か）に応じた閾値を設定し、設定した閾値とパープレキシティの値とを比較して、上記の判定を行う。 Based on the entity extraction result by the entity extraction unit 12 and the perplexity value calculated by the perplexity calculation unit 13, the determination unit 14 determines that the question sentence input by the question sentence input unit 11 is an objective fact. It is the determination means which determines whether it is the question sentence regarding. The determination unit 14 sets a threshold value according to the entity extraction result (whether or not the entity is extracted) by the entity extraction unit 12, compares the set threshold value with the perplexity value, and performs the above determination. Do.

基本的には、パープレキシティの値が低い場合に、入力された質問文が客観的事実に関する質問文であるとみなす。ただし、より客観的事実に関する質問文であるということに確信をもつための情報として、エンティティが抽出されたか否かの情報を考慮する。 Basically, when the perplexity value is low, the input question text is regarded as a question text regarding objective facts. However, information on whether or not an entity has been extracted is considered as information for ensuring that the question is a question about a more objective fact.

具体的には、判定部１４は、予め２つの閾値であるｔｈｒ１及びｔｈｒ２を記憶しておく。ｔｈｒ１はエンティティが抽出されていない場合に使用する第１の閾値であり、ｔｈｒ２よりも小さい値が設定される。ｔｈｒ２はエンティティが抽出されていた場合に使用する第１の閾値である。例えば、ｔｈｒ１＝１０、ｔｈｒ２＝１００と設定することができる。判定部１４は、エンティティ抽出部１２によるエンティティの抽出結果に応じて、ｔｈｒ１又はｔｈｒ２を閾値として設定する。判定部１４は、設定した閾値とパープレキシティの値とを比較する。 Specifically, the determination unit 14 stores two threshold values thr1 and thr2 in advance. thr1 is a first threshold value used when no entity is extracted, and a value smaller than thr2 is set. thr2 is a first threshold value used when an entity has been extracted. For example, thr1 = 10 and thr2 = 100 can be set. The determination unit 14 sets thr1 or thr2 as a threshold according to the entity extraction result by the entity extraction unit 12. The determination unit 14 compares the set threshold value with the perplexity value.

判定部１４は、比較の結果、パープレキシティの値が閾値以下である（あるいは、閾値より小さい）とされた（つまり、客観的事実に関する質問文を想定した言語モデルによって珍しくないとされた質問文である）場合に、入力された質問文が客観的事実に関する質問文であると判定する。その場合には、判定部１４は、質問文を客観的事実応答部１５に出力する。 As a result of the comparison, the determination unit 14 determines that the perplexity value is less than or equal to the threshold value (or smaller than the threshold value) (that is, a question that is not uncommon by a language model that assumes a question sentence regarding objective facts). If it is a sentence), it is determined that the inputted question sentence is a question sentence regarding objective facts. In that case, the determination unit 14 outputs the question sentence to the objective fact response unit 15.

判定部１４は、比較の結果、パープレキシティの値が閾値を超える（あるいは、閾値以上である）とされた（つまり、客観的事実に関する質問文を想定した言語モデルによって珍しいとされた質問文である）場合に、入力された質問文が客観的事実以外に関する質問文であると判定する。その場合には、判定部１４は、質問文をその他質問応答部１６に出力する。 As a result of the comparison, the determination unit 14 determines that the perplexity value exceeds the threshold value (or is greater than or equal to the threshold value) (that is, a question sentence that is rare by a language model that assumes a question sentence regarding objective facts). In the case of the above, it is determined that the inputted question sentence is a question sentence related to other than objective facts. In that case, the determination unit 14 outputs the question sentence to the other question answering unit 16.

客観的事実応答部１５は、判定部１４によって、質問文が客観的事実に関する質問文であると判定された場合には、第１の方法によって当該質問文に対する回答を生成して出力する第１回答手段である。具体的には、単純な構成としては例えば、客観的事実応答部１５は、以下のように回答を生成する。まず、客観的事実応答部１５は、判定部１４から質問文が入力されると、入力した質問文であるテキストを、意味をもつ最小の単位である形態素に分割する。なお、形態素への分割結果は、上述したエンティティ抽出部１２等によるものを利用することとしてもよい。客観的事実応答部１５は、分割後の形態素である単語のうちの名詞をキーワードとして、外部検索エンジン３０に対してキーワード検索の要求（問い合わせ）を行う。 The objective fact response unit 15 generates and outputs an answer to the question text by the first method when the judgment unit 14 determines that the question text is a question text related to the objective fact. It is an answer means. Specifically, as a simple configuration, for example, the objective fact response unit 15 generates an answer as follows. First, when a question sentence is input from the determination unit 14, the objective fact response unit 15 divides the text that is the input question sentence into morphemes that are the smallest meaningful units. The division result into morphemes may be obtained by using the entity extraction unit 12 described above. The objective fact response unit 15 makes a keyword search request (inquiry) to the external search engine 30 using a noun among words that are morphemes after division as a keyword.

外部検索エンジン３０は、キーワード検索の要求を受信し、要求に含まれるキーワードを用いて検索対象の文書（例えば、Ｗｅｂページ）に対する検索を行う装置である。外部検索エンジン３０は、検索を実行することによって、当該キーワードを含む検索対象を抽出し、抽出した検索対象を示す情報あるいは検索対象自体の情報（例えば、Ｗｅｂページを示す情報であるＵＲＬあるいはＷｅｂページ自体）を検索の要求元に送信する。質問応答装置１０と外部検索エンジン３０とは、通信ネットワークを介して接続されており、通信を行うことができる。 The external search engine 30 is a device that receives a keyword search request and performs a search for a search target document (for example, a Web page) using a keyword included in the request. The external search engine 30 performs a search to extract a search target including the keyword, and information indicating the extracted search target or information on the search target itself (for example, a URL or a Web page that is information indicating a Web page) Itself) to the requester of the search. The question answering apparatus 10 and the external search engine 30 are connected via a communication network, and can communicate with each other.

客観的事実応答部１５は、上記の要求に対する外部検索エンジン３０からの応答を、入力した質問文に対する回答として（生成して）出力する。具体的には、客観的事実応答部１５は、入力した質問文に対する回答として、Ｗｅｂページを示す情報であるＵＲＬあるいはＷｅｂページ自体をユーザ端末２０に送信する。例えば、入力された質問文が「人類初の宇宙飛行士は」である場合には、「人類」「初」「宇宙飛行士」との単語（名詞）が検索キーワードとして抽出され、それらによって検索されたＷｅｂページ（それらの検索キーワードを含むＷｅｂページ）が、入力した質問文に対する回答として客観的事実応答部１５（質問応答装置１０）からユーザ端末２０に送信される。 The objective fact response unit 15 outputs (generates) a response from the external search engine 30 to the request as an answer to the input question sentence. Specifically, the objective fact response unit 15 transmits a URL, which is information indicating a Web page, or the Web page itself to the user terminal 20 as an answer to the input question sentence. For example, if the input question is “Human first astronaut is”, the words (nouns) of “humanity”, “first”, and “astronaut” are extracted as search keywords and searched by them. The web page (web page including those search keywords) is transmitted from the objective fact response unit 15 (question answering device 10) to the user terminal 20 as an answer to the inputted question sentence.

客観的事実応答部１５は、特許文献１に示されように検索結果の文書に含まれる単語から、検索キーワードとの関係に基づき特定の単語を回答として出力することとしてもよい。また、客観的事実応答部１５は、質問文であるテキストの入力により客観的な事実を想定した質問応答が可能な機能であれば、どのような機能によって実現されてもよい。 The objective fact response unit 15 may output a specific word as an answer based on the relationship with the search keyword from words included in the search result document as shown in Patent Document 1. Further, the objective fact response unit 15 may be realized by any function as long as it can perform a question response assuming an objective fact by inputting a text as a question sentence.

その他質問応答部１６は、判定部１４によって、質問文が客観的事実以外に関する質問文であると判定された場合には、第１の方法とは異なる第２の方法によって当該質問文に対する回答を生成して出力する第２回答手段である。具体的には、その他質問応答部１６は、予め記憶したルールに基づいて雑談に受け答えする等、客観的質問ではない質問文に対する回答を生成する。例えば、その他質問応答部１６は、図２のテーブルに示すようなユーザの質問（質問文）と回答とを対応付けた情報を記憶しておき、当該情報に基づいて質問文に対する回答を生成する。 The other question answering part 16 replies to the question sentence by the second method different from the first method when the decision part 14 determines that the question sentence is a question sentence related to other than the objective fact. Second answer means for generating and outputting. Specifically, the other question answering unit 16 generates an answer to a question sentence that is not an objective question, such as receiving and answering a chat based on a prestored rule. For example, the other question answering unit 16 stores information in which a user's question (question sentence) and an answer as shown in the table of FIG. 2 are associated, and generates an answer to the question sentence based on the information. .

その他質問応答部１６は、図２に示すテーブルにおいて、ユーザの質問（質問文）のうち判定部１４から入力した質問文と一致する回答を取得し、入力した質問文に対する回答として（生成して）ユーザ端末２０に送信する。 In the table shown in FIG. 2, the other question answering unit 16 obtains an answer that matches the question text input from the determination unit 14 among the user's questions (question text), and generates (generates) an answer to the input question text ) Transmit to the user terminal 20.

また、その他質問応答部１６は、質問文であるテキストの入力により客観的な事実以外を想定した質問応答が可能な機能であれば、どのような機能によって実現されてもよい。以上が、質問応答装置１０の機能構成である。 Further, the other question answering unit 16 may be realized by any function as long as it is a function capable of answering a question assuming other than an objective fact by inputting a text as a question sentence. The functional configuration of the question answering apparatus 10 has been described above.

図３に本実施形態に係る質問応答装置１０のハードウェア構成を示す。図３に示すように質問応答装置１０は、ＣＰＵ（Central Processing Unit）１０１、主記憶装置であるＲＡＭ（RandomAccess Memory）１０２及びＲＯＭ（Read Only Memory）１０３、通信を行うための通信モジュール１０４、並びにハードディスク等の補助記憶装置１０５等のハードウェアを備えるコンピュータを含むものとして構成される。これらの構成要素がプログラム等により動作することにより、上述した質問応答装置１０の機能が発揮される。以上が、本実施形態に係る質問応答装置１０の構成である。 FIG. 3 shows a hardware configuration of the question answering apparatus 10 according to the present embodiment. As shown in FIG. 3, the question answering apparatus 10 includes a central processing unit (CPU) 101, a random access memory (RAM) 102 and a read only memory (ROM) 103, which are main storage devices, a communication module 104 for performing communication, and The computer is configured to include a computer including hardware such as an auxiliary storage device 105 such as a hard disk. The functions of the question answering apparatus 10 described above are exhibited by operating these components by a program or the like. The above is the configuration of the question answering apparatus 10 according to the present embodiment.

引き続いて、図４のフローチャートを用いて、本実施形態に係る質問応答装置１０で実行される処理である質問分野判定方法を説明する。本処理は、ユーザ端末２０から質問文が質問応答装置１０に送信されることによって開始される。本処理では、まず、質問文入力部１１によって質問文が受信されて入力される（Ｓ０１、質問文入力ステップ）。入力された質問文は、質問文入力部１１からエンティティ抽出部１２、パープレキシティ算出部１３及びパープレキシティ算出部１３に出力される。 Subsequently, a question field determination method which is a process executed by the question answering apparatus 10 according to the present embodiment will be described using the flowchart of FIG. This process is started when a question sentence is transmitted from the user terminal 20 to the question answering apparatus 10. In this process, first, a question text is received and inputted by the question text input unit 11 (S01, question text input step). The input question text is output from the question text input section 11 to the entity extraction section 12, the perplexity calculation section 13, and the perplexity calculation section 13.

続いて、エンティティ抽出部１２によって、質問文入力部１１から入力された質問文からエンティティが抽出される（Ｓ０２、エンティティ抽出ステップ）。エンティティの抽出結果が、エンティティ抽出部１２から判定部１４に通知される。 Subsequently, the entity extraction unit 12 extracts entities from the question text input from the question text input unit 11 (S02, entity extraction step). The entity extraction result is notified from the entity extraction unit 12 to the determination unit 14.

一方で、パープレキシティ算出部１３によって、質問文入力部１１から入力された質問文からパープレキシティの値が算出される（Ｓ０３、パープレキシティ算出ステップ）。算出されたパープレキシティの値は、パープレキシティ算出部１３から判定部１４に出力される。なお、Ｓ０２の処理とＳ０３の処理とは互いに独立した処理であるため、処理の順序は必ずしも上記の通りでなくてもよい。 On the other hand, the perplexity calculation unit 13 calculates a perplexity value from the question text input from the question text input unit 11 (S03, perplexity calculation step). The calculated perplexity value is output from the perplexity calculation unit 13 to the determination unit 14. Note that the processing order of S02 and the processing of S03 are independent of each other, and therefore the processing order does not necessarily have to be as described above.

続いて、判定部１４によって、以下のようにエンティティの抽出結果、及びパープレキシティの値に基づいて、入力された質問文が客観的事実に関する質問文であるか否かが判定される。まず、エンティティが抽出されたか否かが判断される（Ｓ０４、判定ステップ）。エンティティが抽出されていなかった場合（Ｓ０４のＮＯ）には、判定部１４によって、閾値にｔｈｒ１が設定されて、パープレキシティの値とｔｈｒ１とが比較される（Ｓ０５、判定ステップ）。即ち、パープレキシティの値がかなり低いか否かが判断される。 Subsequently, the determination unit 14 determines whether or not the input question sentence is a question sentence regarding an objective fact based on the entity extraction result and the perplexity value as follows. First, it is determined whether an entity has been extracted (S04, determination step). When the entity has not been extracted (NO in S04), the determination unit 14 sets thr1 as the threshold value and compares the perplexity value with thr1 (S05, determination step). That is, it is determined whether the perplexity value is very low.

エンティティが抽出されていた場合（Ｓ０４のＹＥＳ）には、判定部１４によって、閾値にｔｈｒ２が設定されて、パープレキシティの値とｔｈｒ２とが比較される（Ｓ０６、判定ステップ）。即ち、パープレキシティの値がある程度低いか否かが判断される。 If the entity has been extracted (YES in S04), the determination unit 14 sets thr2 as the threshold value and compares the perplexity value with thr2 (S06, determination step). That is, it is determined whether the perplexity value is low to some extent.

比較の結果、パープレキシティの値が閾値（ｔｈｒ１又はｔｈｒ２）以下であるとされた場合（Ｓ０５又はＳ０６のＹＥＳ）、判定部１４よって、入力された質問文が客観的事実に関する質問文であると判定され、質問文が客観的事実応答部１５に出力される。続いて、客観的事実応答部１５によって、入力された質問文に対する回答が生成され、ユーザ端末に送信されることで出力される（Ｓ０７、第１回答ステップ）。回答の生成は、外部検索エンジン３０へのキーワード検索の要求により行われる。 As a result of the comparison, when the perplexity value is determined to be equal to or less than the threshold value (thr1 or thr2) (YES in S05 or S06), the question sentence input by the determination unit 14 is a question sentence regarding objective facts. And the question text is output to the objective fact response unit 15. Subsequently, an answer to the inputted question sentence is generated by the objective fact response unit 15 and output by being transmitted to the user terminal (S07, first answer step). The answer is generated by a keyword search request to the external search engine 30.

Ｓ０５又はＳ０６における比較の結果、パープレキシティの値が閾値（ｔｈｒ１又はｔｈｒ２）を超えるとされた場合（Ｓ０５及びＳ０６のＮＯ）、判定部１４よって、入力された質問文が客観的事実以外に関する質問文であると判定され、質問文がその他質問応答部１６に出力される。続いて、その他質問応答部１６によって、入力された質問文に対する回答が生成され、ユーザ端末に送信されることで出力される（Ｓ０８、第２回答ステップ）。回答の生成は、図２に示すテーブルが参照されることにより行われる。Ｓ０７又はＳ０８において質問応答装置１０から送信された回答は、ユーザ端末２０によって受信され、ユーザが認識可能な形式で出力される。以上が、本実施形態に係る質問応答装置１０で実行される処理である質問分野判定方法である。 As a result of the comparison in S05 or S06, if the perplexity value exceeds the threshold value (thr1 or thr2) (NO in S05 and S06), the question entered by the determination unit 14 relates to something other than objective facts. It is determined that it is a question sentence, and the question sentence is output to the other question answering unit 16. Subsequently, an answer to the inputted question sentence is generated by the other question answering unit 16 and transmitted to the user terminal to be output (S08, second answer step). The reply is generated by referring to the table shown in FIG. The answer transmitted from the question answering apparatus 10 in S07 or S08 is received by the user terminal 20 and output in a format that can be recognized by the user. The above is the question field determination method that is the process executed by the question answering apparatus 10 according to the present embodiment.

上述したように、本実施形態では、質問文からのエンティティの抽出結果、及び質問文から算出されたパープレキシティの値の両者が考慮された上で、質問文が客観的事実に関する質問文であるか否かが判定される。エンティティの抽出においては、質問文がまともな質問の形をしていると共に質問文にその主語にあたる部分が含まれている場合にエンティティが抽出される。即ち、本実施形態では、パープレキシティの値とエンティティの存在とに基づいて、より客観的事実に関する質問文であると確信できるか否かを評価する。このように、パープレキシティの値のみではなく、エンティティが抽出できたか否かの情報を用いることで、より柔軟にパープレキシティの値を評価し、正確に質問文が客観的事実に関する質問文であるか否かが判定することができる。また、これにより、ユーザの質問に対して雑談及び客観的な事実に対する応答を含む受け答えが可能である質問応答システム１において、ユーザの質問に対する的確な応答が可能になる。 As described above, in this embodiment, after considering both the entity extraction result from the question sentence and the perplexity value calculated from the question sentence, the question sentence is a question sentence regarding an objective fact. It is determined whether or not there is. In the extraction of an entity, an entity is extracted when the question sentence is in the form of a decent question and the question sentence includes a portion corresponding to the subject. In other words, in the present embodiment, it is evaluated whether or not it is possible to be sure that the question sentence is a more objective fact based on the perplexity value and the existence of the entity. In this way, not only the perplexity value but also the information on whether or not the entity has been extracted is used, so that the perplexity value can be evaluated more flexibly, and the question text can be accurately related to objective facts. It can be determined whether or not. This also makes it possible to accurately answer the user's question in the question answering system 1 that can respond to the user's question including a chat and a response to an objective fact.

なお、本実施形態では、質問文が客観的事実に関する質問文であるか否かを判定していたが、客観的事実に関する質問文以外でも任意の分野に関する質問文であるか否かを判定することとしてもよい。その場合、当該分野の質問文を想定した言語モデルを用いてパープレキシティの値を算出する。 In the present embodiment, it is determined whether or not the question sentence is a question sentence regarding objective facts, but it is determined whether or not it is a question sentence regarding an arbitrary field other than the question sentence regarding objective facts. It is good as well. In this case, the perplexity value is calculated using a language model that assumes a question sentence in the field.

また、本実施形態のように質問文に含まれる単語の前後関係に基づく言語モデルによりパープレキシティの値を算出することとしてもよい。この構成によれば、適切かつ確実にパープレキシティの値を算出することができ、適切かつ確実に本発明を実施することができる。但し、その他の方法（上記以外の言語モデルを用いて）でパープレキシティの値を算出することができれば、その方法が用いられてもよい。 Moreover, it is good also as calculating a perplexity value with the language model based on the context of the word contained in a question sentence like this embodiment. According to this configuration, the perplexity value can be calculated appropriately and reliably, and the present invention can be implemented appropriately and reliably. However, if the perplexity value can be calculated by another method (using a language model other than the above), that method may be used.

また、本実施形態のようにエンティティの抽出結果に応じた閾値を設定し、設定した閾値とパープレキシティの値とを比較して判定を行うこととしてもよい。この構成によれば、適切かつ確実に判定を行うことができ、適切かつ確実に本発明を実施することができる。 Further, as in the present embodiment, a threshold value corresponding to the entity extraction result may be set, and the determination may be performed by comparing the set threshold value with the perplexity value. According to this configuration, the determination can be performed appropriately and reliably, and the present invention can be implemented appropriately and reliably.

また、本実施形態のように客観的事実応答部１５及びその他質問応答部１６といった質問文に対して回答を生成する手段を備えていることとしてもよい。この構成によれば、質問文に対する回答を出力することができる。但し、質問文に対する回答を生成することは、本発明として必ずしも必須のことではない。即ち、本発明は、質問文が客観的事実に関する質問文であるか否かの判定を行って、判定結果を出力するものであってもよい。 Moreover, it is good also as a means to produce | generate an answer with respect to question sentences, such as the objective fact response part 15 and the other question response part 16, like this embodiment. According to this configuration, an answer to the question sentence can be output. However, generating an answer to the question sentence is not necessarily essential for the present invention. That is, the present invention may determine whether or not the question sentence is a question sentence regarding an objective fact, and output the determination result.

また、本実施形態の機能は、質問応答装置１０に本発明に係る機能が全て備えられていた。しかしながら、本発明に係る機能は、必ずしも質問応答装置１０に全て備ええられている必要はない。即ち、本発明に係る機能の物理配置は、上述した実施形態のものに限られない。例えば、本発明に係る機能の一部が、ユーザ端末２０に備えられてもよい。その場合、質問応答装置１０とユーザ端末２０とで本発明に係る質問分野判定装置（質問分野判定システム）を構成する。このように本発明に係る質問分野判定装置は、複数の物理的な装置によって構成されていてもよい。 In addition, all the functions according to the present invention are provided in the question answering device 10 as the functions of the present embodiment. However, all the functions according to the present invention are not necessarily provided in the question answering apparatus 10. That is, the physical arrangement of functions according to the present invention is not limited to that of the above-described embodiment. For example, some of the functions according to the present invention may be provided in the user terminal 20. In that case, the question answering device 10 and the user terminal 20 constitute a question field determination device (question field determination system) according to the present invention. Thus, the question field determination device according to the present invention may be configured by a plurality of physical devices.

引き続いて、本実施形態の変形例について説明する。上述した実施形態では、質問文が客観的事実に関する質問文であるか否かの判定には、エンティティ抽出部１２によるエンティティの抽出可否、即ち、エンティティの「有り」「なし」を用いていた。しかしながら、エンティティの抽出において、エンティティの「有り」「なし」だけでなく、その確信度合いを示す確率値を算出することができ、これを判定に用いることもできる。ＣＲＦ等の多くの系列ラベリング手法では、検出候補のラベルの系列について、系列が与えられた場合の確率値を計算し、最も高い確率が得られる系列を出力するため、その値を利用することができる。 Subsequently, a modification of the present embodiment will be described. In the embodiment described above, whether or not an entity can be extracted by the entity extraction unit 12, that is, whether an entity is “present” or “none” is used to determine whether or not the question is a question regarding an objective fact. However, in the entity extraction, not only the “existence” and “absence” of the entity, but also a probability value indicating the degree of certainty can be calculated, and this can be used for the determination. In many sequence labeling methods such as CRF, a probability value when a sequence is given is calculated for a sequence of labels of detection candidates, and the value that can be obtained is used to output a sequence with the highest probability. it can.

この場合、判定部１４は、エンティティの「有り」「なし」ではなく、確率値の範囲に対して閾値を適用する。例えば、確率値が０．０以上０．３未満はｔｈｒ１＝１０、０．３以上０．７未満はｔｈｒ２＝５０、０．７以上１．０以下はｔｈｒ＝１００とし、これとパープレキシティの値を比較する。これによってエンティティの抽出に関する確信の程度によって、客観的事実の言語モデルの適合度合いの程度を柔軟に考慮に入れることが可能となる。例えば、閾値を２つのみ利用する場合に比べて、エンティティが抽出されたがその系列が抽出される確率が低い場合には通常時よりもややパープレキシティの値が高くないと質問文が客観的事実に関する質問文であるとはみなさない、といった動作が可能になる。 In this case, the determination unit 14 applies a threshold value to a range of probability values instead of “present” and “none” of the entity. For example, when the probability value is 0.0 or more and less than 0.3, thr1 = 10, when 0.3 or more and less than 0.7, thr2 = 50, and when 0.7 or more and 1.0 or less, thr = 100. Compare the values of. This makes it possible to flexibly take into account the degree of conformity of the language model of objective facts, depending on the degree of confidence regarding the extraction of entities. For example, compared to the case where only two threshold values are used, when the entity is extracted but the probability that the sequence is extracted is low, the question sentence is objectively set unless the perplexity value is slightly higher than normal. It is possible to perform an operation such that the question is not considered to be a question sentence related to an actual fact.

引き続いて、本発明に係る第２実施形態について説明する。図５に本発明の第２実施形態に係る質問分野判定装置である質問応答装置１０ａ、及び当該質問応答装置１０ａを含んで構成される質問応答システム１ａを示す。本発明に係る第２実施形態は、特段の説明がない部分については第１実施形態と同様である。本実施形態に係る質問応答装置１０ａでは、判定部１４ａ及び客観的事実応答部１５ａが、第１実施形態に係る判定部１４及び客観的事実応答部１５と異なる機能を有している。また、本実施形態に係る質問応答装置１０ａは、第１実施形態に係る質問応答装置１０に加えて新たな機能部として追加情報督促部１７ａを備えている。 Subsequently, a second embodiment according to the present invention will be described. FIG. 5 shows a question answering device 10a which is a question field determination device according to the second embodiment of the present invention, and a question answering system 1a including the question answering device 10a. The second embodiment according to the present invention is the same as the first embodiment with respect to parts that are not particularly described. In the question answering apparatus 10a according to the present embodiment, the determination unit 14a and the objective fact response unit 15a have different functions from the determination unit 14 and the objective fact response unit 15 according to the first embodiment. Further, the question answering apparatus 10a according to the present embodiment includes an additional information prompting part 17a as a new functional part in addition to the question answering apparatus 10 according to the first embodiment.

判定部１４ａは、エンティティ抽出部１２によるエンティティの抽出結果、及びパープレキシティ算出部１３によって算出されたパープレキシティの値に基づいて、質問文が客観的事実に関する質問文である度合いを算出し、算出した度合いに基づいて、質問文が客観的事実に関する質問文であるか否かを判定する。当該度合いとして、判定部１４ａは、質問文が客観的事実に関する質問文である確率を算出する。即ち、判定部１４ａは、エンティティの抽出結果及びパープレキシティの値から直接、質問文が客観的事実に関する質問文であるか否かを判定するのではなく、一旦、質問文が客観的事実に関する質問文である確率を０から１までの間の値として計算する。 The determination unit 14a calculates the degree that the question sentence is a question sentence related to an objective fact based on the entity extraction result by the entity extraction unit 12 and the perplexity value calculated by the perplexity calculation unit 13. Based on the calculated degree, it is determined whether or not the question sentence is a question sentence regarding objective facts. As the degree, the determination unit 14a calculates the probability that the question sentence is a question sentence regarding an objective fact. That is, the determination unit 14a does not directly determine whether the question sentence is a question sentence related to the objective fact directly from the entity extraction result and the perplexity value, but once the question sentence relates to the objective fact. The probability of being a question sentence is calculated as a value between 0 and 1.

質問文が客観的事実に関する質問文である確率は、例えば、ナイーブベイズ分類器等のクラス分類の手法により求められる。ここでは、質問文が客観的事実に関する質問文である確率をＰ（ｔ＝１｜ｐｐｌ，ｅ）と表記する。ｔ＝１の場合は、質問文が客観的事実に関する質問文であることを示し、ｔ＝０の場合は、それ以外を示す。ｐｐｌはパープレキシティの値を示す。ｅはエンティティ抽出の有無を示す。 The probability that the question sentence is a question sentence related to the objective fact is obtained by a class classification method such as a naive Bayes classifier. Here, the probability that the question sentence is a question sentence regarding an objective fact is expressed as P (t = 1 | ppl, e). When t = 1, it indicates that the question sentence is a question sentence regarding objective facts, and when t = 0, the other is indicated. ppl represents a perplexity value. e indicates the presence or absence of entity extraction.

ナイーブベイズ分類器では、客観的事実に関する質問文とそうではない質問文を文例とした学習モデルを使用して、下記の式によりＰ（ｔ＝１｜ｐｐｌ，ｅ）を計算する。
Ｐ（ｔ＝１｜ｐｐｌ，ｅ）＝Ｐ（ｐｐｌ，ｅ｜ｔ＝１）×Ｐ（ｔ＝１）／Ｐ（ｐｐｌ，ｅ）
上記の式において、右辺は客観的事実に関する質問文の場合のパープレキシティの値及びエンティティ抽出の有無の組に関する確率（Ｐ（ｐｐｌ，ｅ｜ｔ＝１））と、客観的事実に関する質問文の確率（客観的事実に関する質問がされる確率）（Ｐ（ｔ＝１））との積を、パープレキシティの値及びエンティティ抽出の有無の組に関する確率（Ｐ（ｐｐｌ，ｅ））で割った値である。それぞれの値は充分な質問文例を得ることで推定可能である。 In the naive Bayes classifier, P (t = 1 | ppl, e) is calculated by the following equation using a learning model in which a question sentence regarding objective facts and a question sentence that is not so are used as sentence examples.
P (t = 1 | ppl, e) = P (ppl, e | t = 1) × P (t = 1) / P (ppl, e)
In the above formula, the right-hand side indicates the perplexity value and the probability (P (ppl, e | t = 1)) regarding the set of presence / absence of entity extraction in the case of a question sentence regarding objective facts, and the question sentence regarding objective facts. Divided by the probability (P (ppl, e)) of the pair of perplexity value and entity extraction presence / absence (probability of being asked about objective fact) (P (t = 1)) Value. Each value can be estimated by obtaining sufficient example sentences.

具体的には、Ｐ（ｐｐｌ，ｅ｜ｔ＝１）は、客観的事実に関する質問文例のうち、対象のパープレキシティの値及びエンティティ抽出の有無の組（パープレキシティ算出部１３によって算出されたパープレキシティの値、及びエンティティ抽出部１２によるエンティティの抽出結果の組）が出現した割合により推定される。Ｐ（ｔ＝１）は、客観的事実に関する質問文例とそうではない質問文例とを含む全文例に対して、客観的事実に関する質問文例が含まれていた割合により推定される。Ｐ（ｐｐｌ，ｅ）は、全文例に対して、対象のパープレキシティの値及びエンティティ抽出の有無の組が出現した割合により推定される。 Specifically, P (ppl, e | t = 1) is a set of target perplexity value and entity extraction presence / absence of the question example regarding objective fact (calculated by the perplexity calculation unit 13). The perplexity value and the entity extraction result set by the entity extraction unit 12). P (t = 1) is estimated based on the ratio of the question sentence example regarding the objective fact to the full sentence example including the question sentence example regarding the objective fact and the question sentence example that is not so. P (ppl, e) is estimated based on the ratio of the target perplexity value and the presence / absence of entity extraction for all sentence examples.

上記のように客観的事実に関する質問文についての確率を算出することで、正確に質問文が客観的事実に関する質問文である度合いを把握することができる。判定部１４ａは、予め設定された新たな閾値（Ｒｔｈｒ）を保持しておき、質問文が客観的事実に関する質問文ではないと判定した場合、算出した確率が閾値以上であるか否かを判断する。算出した確率が閾値以上であると判断された場合には、ユーザが質問の意図としては何らかの客観的事実を尋ねているが、充分な情報がない場合とみなすことができる。この場合、更にユーザに質問文に対する回答を行うための追加情報を促す。判定部１４ａは、算出した確率が閾値以上であると判断した場合、追加情報をユーザに対して促す（要求する）旨を追加情報督促部１７ａに通知する。また、エンティティ抽出部１２によってエンティティが抽出されている場合には、判定部１４ａは、当該エンティティを合わせて追加情報督促部１７ａに通知する。 As described above, by calculating the probability of the question sentence relating to the objective fact, it is possible to accurately grasp the degree to which the question sentence is the question sentence relating to the objective fact. The determination unit 14a holds a new preset threshold value (Rthr), and determines that the calculated probability is equal to or greater than the threshold value when it is determined that the question sentence is not a question sentence regarding objective facts. To do. If it is determined that the calculated probability is greater than or equal to the threshold, it can be considered that the user has asked some objective fact as the intent of the question, but there is not enough information. In this case, the user is further prompted for additional information for answering the question text. If the determination unit 14a determines that the calculated probability is greater than or equal to the threshold, the determination unit 14a notifies the additional information prompting unit 17a that the user is prompted (requested) for additional information. When the entity extraction unit 12 extracts an entity, the determination unit 14a notifies the additional information prompting unit 17a together with the entity.

追加情報督促部１７ａは、当該質問文に対する回答を行うための、エンティティ抽出部１２によって抽出されたエンティティに関する情報追加を要求する情報追加要求手段である。追加情報督促部１７ａは、判定部１４ａからの通知を受け取ると、ユーザ端末２０に情報追加を促す（要求する）旨の情報を送信（返信）することで当該要求を行う。例えば、ユーザ端末２０に送信する情報（返信）としては、「もう少しはっきりお聞きください。主語、述語、疑問詞を含めてください。例えば「アメリカの大統領の名前はなんですか？」」といったものである。既にエンティティが抽出できている場合（即ち、判定部１４ａから追加情報督促部１７ａにエンティティが通知された場合）、追加情報督促部１７ａは、そのエンティティの文字列を使って返信することも可能である。例えば、当初ユーザ端末２０から入力された質問文が「アメリカ大統領ですか」というものであった場合、抽出されている「アメリカ大統領」とのエンティティを用いて、「アメリカ大統領の何をお調べしますか？」といった返信をすることもできる。 The additional information prompting unit 17a is an information addition requesting unit that requests addition of information related to the entity extracted by the entity extracting unit 12 for answering the question sentence. Upon receiving the notification from the determination unit 14a, the additional information prompting unit 17a makes the request by transmitting (replying) information to the user terminal 20 to prompt (request) information addition. For example, the information (reply) to be transmitted to the user terminal 20 is “Please listen a little more clearly. Please include the subject, predicate, and interrogative. For example,“ What is the name of the American president? ”. . When the entity has already been extracted (that is, when the entity is notified from the determination unit 14a to the additional information prompting unit 17a), the additional information prompting unit 17a can reply using the character string of the entity. is there. For example, when the question sentence initially input from the user terminal 20 is “America President?”, The extracted entity “U.S. President” is used, You can also reply like "Do you want to?"

ユーザ端末２０は、送信された情報を受信して表示出力等の出力を行う。ユーザ端末２０のユーザは、当該出力により追加情報の必要を認識して、ユーザ端末２０に対して上記の出力に応じた追加情報の入力を行う。ユーザ端末２０は、入力した追加情報を質問応答装置１０ａに送信する。質問応答装置１０ａは、当該追加情報を受信する。追加情報は、当初の質問文と同様に処理されてもよいし、当初の質問文と追加情報とを合わせて客観的事実応答部１５ａに入力して、客観的事実応答部１５ａによって、当初の質問文と追加情報とから回答が生成されてもよい。 The user terminal 20 receives the transmitted information and performs output such as display output. The user of the user terminal 20 recognizes the necessity of additional information from the output, and inputs the additional information corresponding to the output to the user terminal 20. The user terminal 20 transmits the input additional information to the question answering apparatus 10a. The question answering apparatus 10a receives the additional information. The additional information may be processed in the same manner as the original question sentence, or the original question sentence and the additional information are combined and input to the objective fact response unit 15a, and the objective fact response unit 15a performs the initial information. An answer may be generated from the question sentence and the additional information.

また、追加情報督促部１７ａは、エンティティ−プロパティ（Ｅｎｔｉｔｙ−Ｐｒｏｐｅｒｔｙ）型の知識データを予め保持しておき、抽出済みのエンティティに対応するプロパティ値を列挙してユーザに選択を促してもよい。エンティティ−プロパティ型の知識データは、客観的事実に関する質問に回答する際に用いられるデータ表現の形態の一つであり、ある事実をその主体であるエンティティとその主体がもつ属性であるプロパティ（Ｐｒｏｐｅｒｔｙ）の値（Ｖａｌｕｅ）との組で表現する。図６にエンティティ−プロパティ型の知識データを示す。 Further, the additional information prompting unit 17a may hold entity-property knowledge data in advance and enumerate property values corresponding to the extracted entities to prompt the user to select. Entity-property-type knowledge data is one of the forms of data representation used when answering questions about objective facts. A fact is a property (property) that is an entity that is a subject and an attribute that the subject has. ) Value (Value). FIG. 6 shows knowledge data of an entity-property type.

追加情報督促部１７ａは、エンティティ−プロパティ型の知識データにおいて、抽出されたエンティティに対応付けられているプロパティを返信に含めることができる。知識データが図６に示す例では、「アメリカ」というエンティティが抽出されていた場合には、例えば「アメリカの何をお調べしますか？大統領・首相、面積、人口についてお答えできます」との情報により追加情報を督促する。 The additional information prompting unit 17a can include, in the reply, a property associated with the extracted entity in the entity-property type knowledge data. In the example of knowledge data shown in Fig. 6, if the entity "America" is extracted, for example, "What do you look for in the US? Can you answer about the President / Prime Minister, area, and population?" Prompt for additional information by information.

また、追加情報督促部１７ａは、別の閾値（Ｐｔｈ）を保持しておき、知識データにおいて、抽出されたエンティティに対応付けられたプロパティの種類がそれより少ない場合は、追加情報を督促するまでもなく、当該知識データを用いて該当するエンティティに対する情報を全て質問文に対する回答として返信してしまってもよい。知識データが図６に示す例では、返信は「アメリカの大統領・首相はｘｘｘｘｘ氏です。面積はｙｙｙｙｙ平方メートル、人口はｚｚｚｚｚ人です」となる。 Further, the additional information prompting unit 17a holds another threshold value (Pth), and if there are fewer property types associated with the extracted entity in the knowledge data, the additional information prompting unit 17a is prompted for additional information. Alternatively, all information on the corresponding entity may be returned as an answer to the question sentence using the knowledge data. In the example of the knowledge data shown in FIG. 6, the reply is “American president and prime minister is Mr. xxxx. The area is yyyyyy square meters and the population is zzzzz people”.

更に、追加情報督促部１７ａは、ユーザ端末２０から過去に送信された質問文であるユーザの過去の質問履歴を保持しておき、そこから質問例を返却してもよい。ユーザの過去の質問履歴の例を図７に示す。この質問履歴では、質問文と当該質問文から抽出されたエンティティと当該質問文から算出されたパープレキシティの値とが対応付けられている。質問文に対応付けられたエンティティは、例えば、過去にエンティティ抽出部１２によって抽出されたものである。質問文に対応付けられたパープレキシティの値は、例えば、過去にパープレキシティ算出部１３によって算出されたものである。なお、過去の質問履歴については、ユーザの区別をせずに全てのユーザの過去の質問履歴を利用することができる。 Further, the additional information prompting unit 17a may hold a user's past question history that is a question sentence transmitted in the past from the user terminal 20, and return a question example therefrom. An example of the user's past question history is shown in FIG. In this question history, the question text, the entity extracted from the question text, and the perplexity value calculated from the question text are associated with each other. The entity associated with the question sentence is, for example, extracted by the entity extraction unit 12 in the past. The perplexity value associated with the question sentence is, for example, calculated by the perplexity calculating unit 13 in the past. As for past question histories, past question histories of all users can be used without distinguishing between users.

追加情報督促部１７ａは、処理対象の質問文と同じエンティティを抽出した質問履歴を探す。追加情報督促部１７ａは、当該質問履歴をもとに情報追加を促す。エンティティとして「アメリカ」が抽出された場合は、図７に示す質問履歴の例では、「アメリカの大統領って誰ですか」「大統領って誰ですアメリカ」「アメリカみんなでこんにちは」の三つの履歴が抽出される。最も単純には、追加情報督促部１７ａは、これらの質問履歴を全てユーザに回答例として返信する。例えば、回答文は「アメリカの何をお調べしますか？例えば「アメリカの大統領って誰ですか」「大統領って誰ですアメリカ」「アメリカみんなでこんにちは」」と回答することができる。 The additional information prompting unit 17a searches for a question history in which the same entity as the question text to be processed is extracted. The additional information prompting unit 17a prompts information addition based on the question history. When “USA” is extracted as an entity, in the example of the question history shown in FIG. 7, there are three histories of “Who is the US President?” “Who is the US President?” Is extracted. Most simply, the additional information prompting unit 17a returns all of these question histories to the user as answer examples. For example, the answer sentence can be “What do you look for in the United States? For example,“ Who is the American president? ”“ Who is the US president?

長期的に運用をすると上記の履歴が大量に出力されることも考えられるため、これらをパープレキシティの値で絞り込んでもよい。例えば、パープレキシティの値が、予め設定された閾値（例えば、３０００）以下のもののみを出力することとしてもよい。この場合、「アメリカの大統領って誰ですか」「大統領って誰ですアメリカ」の２つの質問履歴をユーザに提示することになる。 Since it may be possible to output a large amount of the above-mentioned history when operated for a long period of time, these may be narrowed down by the perplexity value. For example, only perplexity values that are equal to or less than a preset threshold value (eg, 3000) may be output. In this case, two question histories of “who is the US president” and “who is the US president” are presented to the user.

加えて、質問文だけでなく、回答の品質についても何らかの判断基準をもっておき、回答品質が高い質問履歴のみ返答してもよい。例えば、客観的事実応答部１５ａが外部検索エンジン３０から取得した検索結果の数を評価値として検索結果が多く得られた質問文だけを使用するようにしてもよい。どういった指標を実際に用いるかは、客観的事実応答部１５ａが回答を評価する際の機構に依存して決定する。 In addition, not only the question text but also the quality of the answer may have some judgment criteria, and only the question history with high answer quality may be answered. For example, the number of search results acquired by the objective fact response unit 15a from the external search engine 30 may be used as an evaluation value, and only the question texts obtained with many search results may be used. What kind of index is actually used is determined depending on the mechanism used when the objective fact response unit 15a evaluates the answer.

上述したように本実施形態によれば、入力した質問文が、客観的事実に関する質問文であるか否か十分に判定できない場合であっても、質問を把握するために必要な情報を取得することができる。また、これにより、ユーザの質問に対する的確な応答が可能になる。 As described above, according to the present embodiment, even if it is not possible to sufficiently determine whether or not the input question sentence is a question sentence regarding objective facts, information necessary for grasping the question is acquired. be able to. This also enables an accurate response to the user's question.

１，１ａ…質問応答システム、１０，１０ａ…質問応答装置、１１…質問文入力部、１２…エンティティ抽出部、１３…パープレキシティ算出部、１４，１４ａ…判定部、１５，１５ａ…客観的事実応答部、１６…その他質問応答部、１７ａ…追加情報督促部、１０１…ＣＰＵ、１０２…ＲＡＭ、１０３…ＲＯＭ、１０４…通信モジュール、１０５…補助記憶装置、２０…ユーザ端末、３０…外部検索エンジン。 DESCRIPTION OF SYMBOLS 1, 1a ... Question answering system 10, 10a ... Question answering device, 11 ... Question text input part, 12 ... Entity extraction part, 13 ... Perplexity calculation part, 14, 14a ... Determination part, 15, 15a ... Objective Fact response unit, 16 ... other question response unit, 17a ... additional information prompting unit, 101 ... CPU, 102 ... RAM, 103 ... ROM, 104 ... communication module, 105 ... auxiliary storage device, 20 ... user terminal, 30 ... external search engine.

Claims

A question sentence input means for inputting a question sentence;
Entity extraction means for extracting an entity that is a question target from the question text input by the question text input means;
Perplexity calculation means for calculating a perplexity value by a language model assuming a question text in a specific field from the question text input by the question text input means,
Based on the entity extraction result by the entity extraction means and the perplexity value calculated by the perplexity calculation means, the question sentence input by the question sentence input means is a question sentence in the specific field. Determination means for determining whether or not there is;
A question field judgment device comprising:

The question field determination device according to claim 1, wherein the perplexity calculating unit calculates the value of the perplexity based on a context of words included in the question sentence.

The determination unit sets a threshold value according to the extraction result of the entity, compares the set threshold value with the perplexity value, and determines whether the question message is a question message in the specific field. The question field judgment device according to claim 1 or 2 which judges.

If the determination means determines that the question sentence is a question sentence in the specific field, a first answer means for generating and outputting an answer to the question sentence by a first method;
When the determination means determines that the question text is a question text in a field other than the specific field, an answer to the question text is generated by a second method different from the first method. A second answering means for outputting;
The question field judgment device according to any one of claims 1 to 3, further comprising:

The determination means calculates the degree that the question sentence is a question sentence in the specific field based on the extraction result of the entity and the value of the perplexity,
The information addition request | requirement means which requests | requires the information addition regarding the entity extracted by the said entity extraction means for answering the said question sentence according to the said degree calculated by the said determination means is further provided. The question field judgment device given in any 1 paragraph of above.

A question sentence input step for inputting a question sentence;
An entity extraction step of extracting an entity as a question target from the question sentence input in the question sentence input step;
A perplexity calculation step of calculating a perplexity value by a language model assuming a question sentence in a specific field from the question sentence input in the question sentence input step;
Based on the entity extraction result in the entity extraction step and the perplexity value calculated in the perplexity calculation step, the question sentence input in the question sentence input step is a question sentence in the specific field. A determination step of determining whether or not there is;
Question field judgment method including