JP2024108744A

JP2024108744A - Embedded Expression Generation System

Info

Publication number: JP2024108744A
Application number: JP2023013273A
Authority: JP
Inventors: 徳浩勝丸; Norihiro KATSUMARU; 智子川瀬; Tomoko Kawase; 颯己島田; Soki Shimada
Original assignee: NTT Docomo Inc
Current assignee: NTT Docomo Inc
Priority date: 2023-01-31
Filing date: 2023-01-31
Publication date: 2024-08-13

Abstract

To obtain an embedded expression which properly expresses a relation between entities.SOLUTION: An embedded expression generating system 1 includes: a language understanding unit for acquiring a decoded text by inputting, to a decoding portion, a synthesized embedded expression obtained by synthesizing a user embedded expression and a user speech embedded expression obtained by inputting a first user speech text into an embedded portion, and for carrying out a machine learning of a language model having the embedded portion and the decoded portion and of the user embedded expression based on an error between a second user speech text and the decoded text; an expression acquiring unit for acquiring a topic embedded expression by inputting a topic word to the embedded portion; and a relation learning unit for, with respect to a relation graph set with an edge based on a user speech result by using a user and a topic as nodes, acquiring an embedded expression of each node by learning through a graph neural network using the user embedded expression and the topic embedded expression as a feature amount of each node.SELECTED DRAWING: Figure 1

Description

本発明は、埋め込み表現生成システムに関する。 The present invention relates to an embedded expression generation system.

抽象テキストを要約するシステムが知られている。例えば、特許文献１に記載されたシステムでは、リカレントニューラルネットワークにより実現されるエンコーダデコーダモデルが用いられ、エンコーダにより文書の入力トークン埋め込みを処理し、デコーダの出力トークンを処理してサマリートークンの出力が得られる。 A system for summarizing abstract text is known. For example, the system described in Patent Document 1 uses an encoder-decoder model implemented by a recurrent neural network, in which the encoder processes input token embedding in a document, and the decoder's output tokens are processed to obtain summary token output.

特開２０２１－２２４０４号公報JP 2021-22404 A

適切に構成されたエンコーダデコーダモデルを用いることにより、例えば、話題及び場所等を表す単語の埋め込み表現を得ることはできる。また、同様に、単語とは異なる種別のエンティティである人の埋め込み表現を得ることも可能である。埋め込み表現は実数ベクトルで表現されるので、単語と単語との間及び人と人との間のように同種のエンティティ間の距離を計算することは可能だった。しかしながら、単語及び人のそれぞれの埋め込み表現の生成において、人と単語の関係が学習されていない場合には、人及び単語のそれぞれの埋め込み表現にそれらの関係性が反映されないので、人と単語との間のような異なる種別のエンティティ間の距離を計算することはできなかった。 By using a properly constructed encoder-decoder model, it is possible to obtain embeddings of words that represent topics, places, etc. Similarly, it is also possible to obtain embeddings of people, which are a different type of entity from words. Since the embeddings are represented as real vectors, it was possible to calculate the distance between entities of the same type, such as between words and between people. However, if the relationship between people and words has not been learned in generating the embeddings of words and people, those relationships are not reflected in the embeddings of people and words, and it was not possible to calculate the distance between different types of entities, such as between people and words.

そこで、本発明は、上記問題点に鑑みてなされたものであり、異なるエンティティ間の関係が適切に表されたエンティティの埋め込み表現を得ることを目的とする。 The present invention has been made in consideration of the above problems, and aims to obtain an embedding representation of an entity in which the relationships between different entities are appropriately represented.

上記課題を解決するために、本開示の一側面に係る埋め込み表現生成システムは、少なくともユーザ及び話題の埋め込み表現を生成する埋め込み表現生成システムであって、埋め込み部及び復号部を含むエンコーダデコーダモデルにより構成される言語モデルを学習する言語理解部であって、埋め込み部は、入力されたテキストの特徴を表す埋め込み表現を出力し、復号部は、埋め込み部からの出力を少なくとも含む埋め込み表現を復号し、ユーザの発話の内容を表す発話テキストのうちの、一のユーザの発話内容を表す第１のユーザ発話テキストを埋め込み部に入力することにより埋め込み部から出力されたユーザ発話埋め込み表現を取得し、ユーザ発話埋め込み表現と当該一のユーザの埋め込み表現であるユーザ埋め込み表現とを合成した合成埋め込み表現を復号部に入力することにより復号部から出力された復号テキストを取得し、発話テキストにおいて第１のユーザ発話テキストに引き続く第２のユーザ発話テキストと復号テキストとの誤差が小さくなるように言語モデル及びユーザ埋め込み表現を調整する機械学習を実施し、ユーザ埋め込み表現は、学習前の初期のユーザ埋め込み表現又は学習過程のユーザ埋め込み表現である、言語理解部と、発話テキストから、ユーザの発話における話題を表す語句である話題語を抽出する話題抽出部と、話題語を学習済みの埋め込み部に入力し、埋め込み部から出力される話題埋め込み表現を取得する埋め込み表現取得部と、ユーザの発話の履歴及び行動の履歴に基づいて、少なくともユーザ及び話題をノードとし、ユーザ間の対話の実績をユーザ間を接続するエッジとし、ユーザの話題語の発話の実績を当該ユーザと話題とを接続するエッジとするグラフである関係グラフを生成する関係抽出部と、学習済みのユーザ埋め込み表現及び話題埋め込み表現の各々を関係グラフにおけるユーザ及び話題のノードの特徴量とするグラフニューラルネットワークの学習により、各ノードの学習済みの埋め込み表現を得る関係学習部と、各ノードの埋め込み表現を出力する埋め込み表現出力部と、を備える。 In order to solve the above problem, an embedded expression generation system according to one aspect of the present disclosure is an embedded expression generation system that generates embedded expressions of at least a user and a topic, and a language understanding unit that learns a language model composed of an encoder-decoder model including an embedding unit and a decoding unit, in which the embedding unit outputs embedded expressions that represent the features of the input text, the decoding unit decodes the embedded expressions that include at least the output from the embedding unit, obtains a user utterance embedded expression output from the embedding unit by inputting a first user utterance text that represents the utterance content of one user out of the utterance text that represents the content of the user's utterance to the embedding unit, obtains a decoded text output from the decoding unit by inputting a composite embedded expression that combines the user utterance embedded expression and the user embedded expression that is the embedded expression of the one user to the decoding unit, and learns the language model so that an error between the second user utterance text that follows the first user utterance text in the utterance text and the decoded text is reduced. The system includes a language understanding unit that performs machine learning to adjust the model and user embedded expressions, where the user embedded expressions are initial user embedded expressions before learning or user embedded expressions in the learning process, a topic extraction unit that extracts topic words, which are phrases that represent topics in the user's utterance, from the spoken text, an embedded expression acquisition unit that inputs the topic words to a learned embedding unit and acquires topic embedded expressions output from the embedding unit, a relationship extraction unit that generates a relationship graph based on the user's utterance history and behavior history, in which at least users and topics are nodes, conversation records between users are edges that connect users, and user utterance records of topic words are edges that connect users and topics, a relationship learning unit that obtains learned embedded expressions for each node by learning a graph neural network in which each of the learned user embedded expressions and topic embedded expressions is a feature of the user and topic nodes in the relationship graph, and an embedded expression output unit that outputs the embedded expression for each node.

上記の側面によれば、エンコーダデコーダモデルにより構成される言語モデルが、第１のユーザ発話テキスト及び第２のユーザ発話テキストのペアを教師データとして、第１のユーザ発話テキストを埋め込み部に入力して得られたユーザ発話埋め込み表現とユーザ埋め込み表現とを合成した合成埋め込み表現を復号部に入力し、復号部から出力された復号テキストと第２のユーザ発話テキストとの誤差が小さくなるように言語モデル及びユーザ埋め込み表現が機械学習されることにより、話題語の入力に応じて好適な話題埋め込み表現を出力する埋め込み部（エンコーダ）が得られると共に、ユーザの特徴が好適に反映されたユーザ埋め込み表現が得られる。そして、ユーザ及び話題をノードとし、ユーザの発話及び行動の履歴に基づいてノード間にエッジが張られた関係グラフが生成され、話題語を埋め込み部に入力することにより得られる話題埋め込み表現及び学習済みのユーザ埋め込み表現の各々を話題語及びユーザの特徴量とするグラフニューラルネットワークの学習により、話題語及びユーザの特徴が好適に反映された、学習済みの話題埋め込み表現及びユーザ埋め込み表現が得られる。得られた話題埋め込み表現及びユーザ埋め込み表現には、それらのエンティティ間の関係が反映されているので、ユーザと話題との間の距離を計算することが可能である。 According to the above aspect, a language model composed of an encoder-decoder model uses a pair of a first user utterance text and a second user utterance text as teacher data, inputs the first user utterance text to an embedding unit, and inputs a composite embedding representation obtained by combining the user utterance embedding representation and the user embedding representation obtained by inputting the first user utterance text to the embedding unit into the decoding unit. The language model and the user embedding representation are machine-learned so that the error between the decoded text output from the decoding unit and the second user utterance text is reduced, thereby obtaining an embedding unit (encoder) that outputs a suitable topic embedded representation in response to the input of a topic word, and obtaining a user embedded representation that suitably reflects the user's characteristics. Then, a relationship graph is generated in which the user and the topic are nodes and edges are drawn between the nodes based on the user's utterance and behavior history, and a graph neural network is trained in which the topic embedded representation obtained by inputting the topic word into the embedding unit and the learned user embedded representation are each used as the feature of the topic word and the user, thereby obtaining a learned topic embedded representation and a user embedded representation that suitably reflect the topic word and the user's characteristics. The resulting topic embeddings and user embeddings reflect the relationships between those entities, making it possible to calculate the distance between the user and the topic.

異なるエンティティ間の関係が適切に表されたエンティティの埋め込み表現を得ることが可能となる。 It is possible to obtain embedding representations of entities that properly represent the relationships between different entities.

本実施形態の埋め込み表現生成装置の機能的構成を示すブロック図である。1 is a block diagram showing a functional configuration of an embedded expression generation device according to an embodiment of the present invention; 埋め込み表現生成装置のハードブロック図である。FIG. 2 is a hardware block diagram of the embedded expression generating device. 発話テキストの取得工程を概略的に説明する図である。FIG. 2 is a diagram for explaining an outline of a process for acquiring spoken text. 言語モデルの構成及び言語モデルの機械学習処理の例を示す図である。FIG. 2 is a diagram illustrating an example of a language model configuration and machine learning processing of the language model. 学習済みの言語モデルの埋め込み部を用いた埋め込み表現取得処理の例を示す図である。FIG. 13 is a diagram illustrating an example of an embedded expression acquisition process using an embedding unit of a trained language model. 関係グラフの生成のためのエッジの取得の例を示す図である。FIG. 13 is a diagram illustrating an example of edge acquisition for generating a relationship graph. 関係グラフの一例及び関係グラフからの正例及び負例の抽出の例を示す図である。1A and 1B are diagrams illustrating an example of a relationship graph and an example of extraction of positive examples and negative examples from the relationship graph. 関係グラフを構成するグラフニューラルネットワークの学習により得られる各エンティティの埋め込み表現の例を示す図である。FIG. 1 is a diagram showing an example of an embedded representation of each entity obtained by learning a graph neural network that configures a relationship graph. 埋め込み表現生成装置における埋め込み表現生成方法の処理内容を示すフローチャートである。10 is a flowchart showing the process of an embedded-expression generating method in the embedded-expression generating device. 言語モデルの機械学習の処理内容を示すフローチャートである。11 is a flowchart showing the processing contents of machine learning of a language model. 埋め込み表現生成プログラムの構成を示す図である。FIG. 13 is a diagram showing a configuration of an embedded expression generation program.

本発明に係る埋め込み表現生成システムの実施形態について図面を参照して説明する。なお、可能な場合には、同一の部分には同一の符号を付して、重複する説明を省略する。 An embodiment of an embedded expression generation system according to the present invention will be described with reference to the drawings. Where possible, identical parts will be given the same reference numerals and duplicated explanations will be omitted.

図１は、本実施形態に係る埋め込み表現生成システムの機能的構成を示す図である。本実施形態の埋め込み表現生成システム１は、少なくともユーザ及び話題の埋め込み表現を生成するシステムであって、一例として、埋め込み表現生成装置１０により構成される。 Figure 1 is a diagram showing the functional configuration of an embedded expression generation system according to this embodiment. The embedded expression generation system 1 of this embodiment is a system that generates embedded expressions for at least users and topics, and is, as an example, configured by an embedded expression generation device 10.

埋め込み表現生成装置１０は、図１に示すように、機能的には、発話ログ取得部１１、音声認識部１２、テキスト取得部１３、感情取得部１４、言語理解部１５、話題抽出部１６、埋め込み表現取得部１７、関係抽出部１８、関係学習部１９、埋め込み表現出力部２０及びリンク予測部２１を備える。これらの各機能部１１～２１は、図１に例示されるように一つの装置に構成されてもよいし、複数の装置に分散されて構成されてもよい。 As shown in FIG. 1, the embedded expression generation device 10 functionally comprises a speech log acquisition unit 11, a voice recognition unit 12, a text acquisition unit 13, an emotion acquisition unit 14, a language understanding unit 15, a topic extraction unit 16, an embedded expression acquisition unit 17, a relationship extraction unit 18, a relationship learning unit 19, an embedded expression output unit 20, and a link prediction unit 21. Each of these functional units 11 to 21 may be configured in a single device as illustrated in FIG. 1, or may be distributed across multiple devices.

なお、図１に示したブロック図は、機能単位のブロックを示している。これらの機能ブロック（構成部）は、ハードウェア及びソフトウェアの少なくとも一方の任意の組み合わせによって実現される。また、各機能ブロックの実現方法は特に限定されない。すなわち、各機能ブロックは、物理的又は論理的に結合した１つの装置を用いて実現されてもよいし、物理的又は論理的に分離した２つ以上の装置を直接的又は間接的に（例えば、有線、無線などを用いて）接続し、これら複数の装置を用いて実現されてもよい。機能ブロックは、上記１つの装置又は上記複数の装置にソフトウェアを組み合わせて実現されてもよい。 The block diagram shown in FIG. 1 shows functional blocks. These functional blocks (components) are realized by any combination of at least one of hardware and software. Furthermore, the method of realizing each functional block is not particularly limited. That is, each functional block may be realized using one device that is physically or logically coupled, or may be realized using two or more devices that are physically or logically separated and connected directly or indirectly (e.g., using wires, wirelessly, etc.). The functional blocks may be realized by combining the one device or the multiple devices with software.

機能には、判断、決定、判定、計算、算出、処理、導出、調査、探索、確認、受信、送信、出力、アクセス、解決、選択、選定、確立、比較、想定、期待、見做し、報知（broadcasting）、通知（notifying）、通信（communicating）、転送（forwarding）、構成（configuring）、再構成（reconfiguring）、割り当て（allocating、mapping）、割り振り（assigning）などがあるが、これらに限られない。たとえば、送信を機能させる機能ブロック（構成部）は、送信部（transmitting unit）や送信機（transmitter）と呼称される。いずれも、上述したとおり、実現方法は特に限定されない。 Functions include, but are not limited to, judgement, determination, judgment, calculation, computation, processing, derivation, investigation, search, confirmation, reception, transmission, output, access, resolution, selection, selection, establishment, comparison, assumption, expectation, regard, broadcasting, notifying, communicating, forwarding, configuring, reconfiguring, allocating, mapping, and assignment. For example, a functional block (component) that performs the transmission function is called a transmitting unit or transmitter. As mentioned above, there are no particular limitations on the method of realization for either of these.

例えば、本発明の一実施の形態における埋め込み表現生成装置１０は、コンピュータとして機能してもよい。図２は、本実施形態に係る埋め込み表現生成装置１０のハードウェア構成の一例を示す図である。埋め込み表現生成装置１０は、物理的には、プロセッサ１００１、メモリ１００２、ストレージ１００３、通信装置１００４、入力装置１００５、出力装置１００６、バス１００７などを含むコンピュータ装置として構成されてもよい。 For example, the embedded expression generation device 10 in one embodiment of the present invention may function as a computer. FIG. 2 is a diagram showing an example of the hardware configuration of the embedded expression generation device 10 according to this embodiment. The embedded expression generation device 10 may be physically configured as a computer device including a processor 1001, a memory 1002, a storage 1003, a communication device 1004, an input device 1005, an output device 1006, a bus 1007, etc.

なお、以下の説明では、「装置」という文言は、回路、デバイス、ユニットなどに読み替えることができる。埋め込み表現生成装置１０のハードウェア構成は、図に示した各装置を１つ又は複数含むように構成されてもよいし、一部の装置を含まずに構成されてもよい。 In the following description, the term "apparatus" may be interpreted as a circuit, device, unit, etc. The hardware configuration of the embedded expression generation device 10 may be configured to include one or more of the devices shown in the figure, or may be configured to exclude some of the devices.

埋め込み表現生成装置１０における各機能は、プロセッサ１００１、メモリ１００２などのハードウェア上に所定のソフトウェア（プログラム）を読み込ませることで、プロセッサ１００１が演算を行い、通信装置１００４による通信や、メモリ１００２及びストレージ１００３におけるデータの読み出し及び／又は書き込みを制御することで実現される。 Each function of the embedded expression generation device 10 is realized by loading a specific software (program) onto hardware such as the processor 1001 and memory 1002, causing the processor 1001 to perform calculations and control communication via the communication device 1004 and the reading and/or writing of data in the memory 1002 and storage 1003.

プロセッサ１００１は、例えば、オペレーティングシステムを動作させてコンピュータ全体を制御する。プロセッサ１００１は、周辺装置とのインターフェース、制御装置、演算装置、レジスタなどを含む中央処理装置（ＣＰＵ：Central Processing Unit）で構成されてもよい。例えば、図１に示した各機能部１１～２１などは、プロセッサ１００１で実現されてもよい。 The processor 1001, for example, runs an operating system to control the entire computer. The processor 1001 may be configured as a central processing unit (CPU) that includes an interface with peripheral devices, a control device, an arithmetic unit, a register, etc. For example, each of the functional units 11 to 21 shown in FIG. 1 may be realized by the processor 1001.

また、プロセッサ１００１は、プログラム（プログラムコード）、ソフトウェアモジュールやデータを、ストレージ１００３及び／又は通信装置１００４からメモリ１００２に読み出し、これらに従って各種の処理を実行する。プログラムとしては、上述の実施の形態で説明した動作の少なくとも一部をコンピュータに実行させるプログラムが用いられる。例えば、埋め込み表現生成装置１０の各機能部１１～２１は、メモリ１００２に格納され、プロセッサ１００１で動作する制御プログラムによって実現されてもよい。上述の各種処理は、１つのプロセッサ１００１で実行される旨を説明してきたが、２以上のプロセッサ１００１により同時又は逐次に実行されてもよい。プロセッサ１００１は、１以上のチップで実装されてもよい。なお、プログラムは、電気通信回線を介してネットワークから送信されても良い。 The processor 1001 also reads out programs (program codes), software modules, and data from the storage 1003 and/or the communication device 1004 into the memory 1002, and executes various processes according to these. The programs used are those that cause a computer to execute at least some of the operations described in the above-mentioned embodiments. For example, the functional units 11 to 21 of the embedded expression generation device 10 may be realized by a control program stored in the memory 1002 and operated by the processor 1001. Although the above-mentioned various processes have been described as being executed by one processor 1001, they may be executed simultaneously or sequentially by two or more processors 1001. The processor 1001 may be implemented in one or more chips. The programs may be transmitted from a network via a telecommunications line.

メモリ１００２は、コンピュータ読み取り可能な記録媒体であり、例えば、ＲＯＭ（Read Only Memory）、ＥＰＲＯＭ（Erasable Programmable ＲＯＭ）、ＥＥＰＲＯＭ（Electrically Erasable Programmable ＲＯＭ）、ＲＡＭ（Random Access Memory）などの少なくとも１つで構成されてもよい。メモリ１００２は、レジスタ、キャッシュ、メインメモリ（主記憶装置）などと呼ばれてもよい。メモリ１００２は、本発明の一実施の形態に係る埋め込み表現生成方法を実施するために実行可能なプログラム（プログラムコード）、ソフトウェアモジュールなどを保存することができる。 The memory 1002 is a computer-readable recording medium, and may be composed of at least one of, for example, a read only memory (ROM), an erasable programmable ROM (EPROM), an electrically erasable programmable ROM (EEPROM), a random access memory (RAM), etc. The memory 1002 may also be called a register, a cache, a main memory (primary storage device), etc. The memory 1002 can store executable programs (program codes), software modules, etc. for implementing the embedded representation generation method according to one embodiment of the present invention.

ストレージ１００３は、コンピュータ読み取り可能な記録媒体であり、例えば、ＣＤ－ＲＯＭ（Compact Disc ＲＯＭ）などの光ディスク、ハードディスクドライブ、フレキシブルディスク、光磁気ディスク(例えば、コンパクトディスク、デジタル多用途ディスク、Ｂｌｕ－ｒａｙ（登録商標）ディスク)、スマートカード、フラッシュメモリ(例えば、カード、スティック、キードライブ)、フロッピー（登録商標）ディスク、磁気ストリップなどの少なくとも１つで構成されてもよい。ストレージ１００３は、補助記憶装置と呼ばれてもよい。上述の記憶媒体は、例えば、メモリ１００２及び／又はストレージ１００３を含むデータベース、サーバその他の適切な媒体であってもよい。 Storage 1003 is a computer-readable recording medium, and may be, for example, at least one of an optical disk such as a CD-ROM (Compact Disc ROM), a hard disk drive, a flexible disk, a magneto-optical disk (e.g., a compact disk, a digital versatile disk, a Blu-ray (registered trademark) disk), a smart card, a flash memory (e.g., a card, a stick, a key drive), a floppy (registered trademark) disk, a magnetic strip, and the like. Storage 1003 may also be referred to as an auxiliary storage device. The above-mentioned storage medium may be, for example, a database, a server, or other suitable medium including memory 1002 and/or storage 1003.

通信装置１００４は、有線及び／又は無線ネットワークを介してコンピュータ間の通信を行うためのハードウェア（送受信デバイス）であり、例えばネットワークデバイス、ネットワークコントローラ、ネットワークカード、通信モジュールなどともいう。 The communication device 1004 is hardware (transmitting/receiving device) for communicating between computers via a wired and/or wireless network, and is also referred to as, for example, a network device, a network controller, a network card, a communication module, etc.

入力装置１００５は、外部からの入力を受け付ける入力デバイス（例えば、キーボード、マウス、マイクロフォン、スイッチ、ボタン、センサなど）である。出力装置１００６は、外部への出力を実施する出力デバイス（例えば、ディスプレイ、スピーカー、LEDランプなど）である。なお、入力装置１００５及び出力装置１００６は、一体となった構成（例えば、タッチパネル）であってもよい。 The input device 1005 is an input device (e.g., a keyboard, a mouse, a microphone, a switch, a button, a sensor, etc.) that accepts input from the outside. The output device 1006 is an output device (e.g., a display, a speaker, an LED lamp, etc.) that performs output to the outside. Note that the input device 1005 and the output device 1006 may be integrated into one configuration (e.g., a touch panel).

また、プロセッサ１００１やメモリ１００２などの各装置は、情報を通信するためのバス１００７で接続される。バス１００７は、単一のバスで構成されてもよいし、装置間で異なるバスで構成されてもよい。 In addition, each device, such as the processor 1001 and memory 1002, is connected by a bus 1007 for communicating information. The bus 1007 may be configured as a single bus, or may be configured as different buses between the devices.

また、埋め込み表現生成装置１０は、マイクロプロセッサ、デジタル信号プロセッサ（ＤＳＰ：Digital Signal Processor）、ＡＳＩＣ（Application Specific Integrated Circuit）、ＰＬＤ（Programmable Logic Device）、ＦＰＧＡ（Field Programmable Gate Array）などのハードウェアを含んで構成されてもよく、当該ハードウェアにより、各機能ブロックの一部又は全てが実現されてもよい。例えば、プロセッサ１００１は、これらのハードウェアの少なくとも１つで実装されてもよい。 The embedded expression generation device 10 may also be configured to include hardware such as a microprocessor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a programmable logic device (PLD), or a field programmable gate array (FPGA), and some or all of the functional blocks may be realized by the hardware. For example, the processor 1001 may be implemented by at least one of these pieces of hardware.

次に、埋め込み表現生成装置１０の各機能部について説明する。発話ログ取得部１１は、ユーザの発話の内容を表す発話ログを取得する。音声認識部１２は、発話ログが音声である場合に、発話ログをテキストに変換する。テキスト取得部１３は、発話ログに基づきユーザの発話の内容を表すテキストである発話テキストを取得する。感情取得部１４は、ユーザの発話が発せられた時の当該ユーザの感情を表す感情情報を発話の音声又は当該ユーザの表情に基づいて取得し、取得された感情情報を当該発話の内容を表す発話テキストに関連付ける。 Next, each functional unit of the embedded expression generation device 10 will be described. The speech log acquisition unit 11 acquires a speech log that represents the content of the user's utterance. The voice recognition unit 12 converts the speech log into text when the speech log is voice. The text acquisition unit 13 acquires a speech text, which is text that represents the content of the user's utterance, based on the speech log. The emotion acquisition unit 14 acquires emotion information that represents the emotion of the user at the time the user uttered the utterance based on the voice of the utterance or the facial expression of the user, and associates the acquired emotion information with the speech text that represents the content of the utterance.

図３を参照して、発話ログ取得部１１、音声認識部１２、テキスト取得部１３及び感情取得部１４の処理内用を具体的に説明する。図３は、発話テキストの取得工程を概略的に説明する図である。 The processing operations of the speech log acquisition unit 11, the voice recognition unit 12, the text acquisition unit 13, and the emotion acquisition unit 14 will be specifically described with reference to FIG. 3. FIG. 3 is a diagram that outlines the process of acquiring speech text.

発話ログ取得部１１は、例えばキーボード及びタッチパネル等に例示される入力装置４１を介した入力に基づいて、ユーザの発話の内容を表す発話ログをテキストの態様で取得してもよい。また、発話ログ取得部１１は、例えばマイク４２を介した音声入力に基づいて、ユーザの発話内用を表す発話ログを、音声データの態様で取得してもよい。 The speech log acquisition unit 11 may acquire a speech log representing the contents of the user's speech in the form of text, based on input via an input device 41, for example, a keyboard, a touch panel, or the like. The speech log acquisition unit 11 may also acquire a speech log representing the contents of the user's speech in the form of audio data, based on audio input via a microphone 42, for example.

発話ログ取得部１１により取得される発話ログは、所定の仮想空間におけるユーザの発話内容を表す音声またはテキスト（チャット）であってもよい。所定の仮想空間は、一例として、いわゆるメタバースと言われる仮想空間であってもよい。ユーザによる発話は、メタバース等の仮想空間におけるアバターによる発話であってもよく、発話ログ取得部１１は、アバターによる発話を表す発話ログを、音声又はテキストの態様で取得してもよい。 The speech log acquired by the speech log acquisition unit 11 may be voice or text (chat) representing the contents of the user's speech in a specified virtual space. The specified virtual space may be, for example, a virtual space known as the metaverse. The user's speech may be speech by an avatar in a virtual space such as the metaverse, and the speech log acquisition unit 11 may acquire the speech log representing the speech by the avatar in the form of voice or text.

音声認識部１２は、発話ログ取得部１１により音声の態様の発話ログが取得された場合に、音声をテキストに変換する。音声認識部１２は、如何なる手法により音声からなる発話ログをテキストに変換してもよく、例えば、周知の音声認識技術により音声をテキストに変換してもよい。 The speech recognition unit 12 converts speech into text when a speech log of speech form is acquired by the speech log acquisition unit 11. The speech recognition unit 12 may convert the speech log consisting of speech into text by any method, and may convert speech into text by, for example, well-known speech recognition technology.

テキスト取得部１３は、発話ログに基づいて、ユーザの発話の内容を表すテキストである発話テキストを取得する。発話ログ取得部１１によりテキストの態様で発話ログが取得される場合には、テキスト取得部１３は、発話ログを表すテキストを発話テキストとして取得する。また、発話ログ取得部１１により音声の態様で発話ログが取得される場合には、テキスト取得部１３は、音声認識部１２によりテキストに変換された発話ログを発話テキストとして取得する。そして、テキスト取得部１３は、取得した発話テキストｔ１を言語理解部１５に送出する。 The text acquisition unit 13 acquires speech text, which is text that represents the content of the user's utterance, based on the speech log. When the speech log acquisition unit 11 acquires the speech log in the form of text, the text acquisition unit 13 acquires the text that represents the speech log as the speech text. When the speech log acquisition unit 11 acquires the speech log in the form of speech, the text acquisition unit 13 acquires the speech log converted into text by the speech recognition unit 12 as the speech text. Then, the text acquisition unit 13 sends the acquired speech text t1 to the language understanding unit 15.

感情取得部１４は、ユーザの発話が発せられた時の当該ユーザの感情を表す感情情報を、例えば、マイク４２を介して取得されるユーザの発話音声、又は、カメラ４３を介して取得されるユーザの表情を表す画像に基づいて取得する。 The emotion acquisition unit 14 acquires emotion information that represents the emotion of the user when the user speaks, for example, based on the user's speech acquired via the microphone 42 or an image representing the user's facial expression acquired via the camera 43.

感情取得部１４は、如何なる手法により発話音声からユーザの感情情報を取得してもよく、例えば、周知の感情認識技術により発話音声から感情情報を取得してもよい。また、感情取得部１４は、如何なる手法によりユーザの表情を表す画像からユーザの感情情報を取得してもよく、例えば、周知の表情認識技術によりユーザの表情を表す画像から感情情報を取得してもよい。 The emotion acquisition unit 14 may acquire the user's emotion information from the spoken voice by any method, for example, may acquire the emotion information from the spoken voice by well-known emotion recognition technology. The emotion acquisition unit 14 may also acquire the user's emotion information from an image showing the user's facial expression by any method, for example, may acquire the emotion information from an image showing the user's facial expression by well-known facial expression recognition technology.

また、感情情報の取得源はユーザの表情及び発話音声に限定されず、感情取得部１４は、仮想空間におけるユーザの発話時のアバターの状態から取得してもよい。 In addition, the source of emotion information is not limited to the user's facial expressions and spoken voice, and the emotion acquisition unit 14 may acquire the emotion information from the state of the avatar when the user speaks in the virtual space.

感情情報は、例えば、「喜び」、「怒り」、「悲しみ」、「驚き」等の種別を含み、「楽しい」、「穏やか」等のいくつかの所定の感情の種別は、ポジティブ（肯定的）な感情として分類されうる。 Emotion information includes categories such as "joy," "anger," "sadness," and "surprise," and some predetermined emotion categories such as "fun" and "calm" can be classified as positive emotions.

感情取得部１４は、ユーザの発話時の表情及び音声等から取得した感情情報を、当該発話の内容を表す発話テキストｔ１に関連付ける。従って、言語理解部１５は、感情情報が関連付けられた発話テキストｔ１を取得できる。 The emotion acquisition unit 14 associates emotion information acquired from the user's facial expression and voice when speaking with the spoken text t1 that represents the content of the utterance. Therefore, the language understanding unit 15 can acquire the spoken text t1 associated with the emotion information.

言語理解部１５は、エンコーダデコーダモデルにより構成される言語モデルの機械学習を実施する。図４は、言語モデルの構成及び言語モデルの機械学習処理の例を示す図である。言語モデルｍｄは、ニューラルネットワークを含んで構成されるエンコーダデコーダモデルであって、埋め込み部ｅｎ（エンコーダ）及び復号部ｄｅ（デコーダ）を含む。 The language understanding unit 15 performs machine learning of a language model configured by an encoder-decoder model. FIG. 4 is a diagram showing an example of the configuration of a language model and a machine learning process of the language model. The language model md is an encoder-decoder model configured including a neural network, and includes an embedding unit en (encoder) and a decoding unit de (decoder).

言語モデルｍｄの構成は限定されないが、例えば、ｓｅｑ２ｓｅｑといったリカレントニューラルネットワークの対から構成されるエンコーダデコーダモデルであってもよいし、例えば、Ｔ５（Ｔｅｘｔ－ｔｏ－ＴｅｘｔＴｒａｎｓｆｅｒＴｒａｎｓｆｏｒｍｅｒ）といったトランスフォーマにより構成されてもよい。 The configuration of the language model md is not limited, but may be, for example, an encoder-decoder model composed of a pair of recurrent neural networks such as seq2seq, or may be composed of a transformer such as T5 (Text-to-Text Transfer Transformer).

埋め込み部ｅｎは、入力されたテキストをエンコードし、当該テキストの特徴を表す埋め込み表現を出力する。復号部ｄｅは、埋め込み部ｅｎからの出力を少なくとも含む埋め込み表現を復号（デコード）し、復号テキストｄｔを出力する。なお、言語モデルの入出力の説明において、「テキスト」と記載されたものは、所定の手法によりテキストが変換されたベクトルデータであったり、テキストを表すベクトルデータとして出力されたりするものである。 The embedding unit en encodes the input text and outputs an embedded representation that represents the characteristics of the text. The decoding unit decodes the embedded representation that includes at least the output from the embedding unit en, and outputs the decoded text dt. Note that in the explanation of the input and output of the language model, what is referred to as "text" may be vector data in which text has been converted using a specified method, or may be output as vector data representing text.

言語理解部１５は、ユーザの発話の内容を表す発話テキストのうちの、一のユーザの発話内容を表す第１のユーザ発話テキストを埋め込み部ｅｎに入力することにより、埋め込み部ｅｎから出力されたユーザ発話埋め込み表現を取得する。 The language understanding unit 15 inputs a first user utterance text representing the content of a user's utterance from among the utterance texts representing the content of the user's utterance to the embedding unit en, thereby acquiring the user utterance embedded expression output from the embedding unit en.

図４に示す例では、言語理解部１５は、言語モデルｍｄの学習のための教師データである、ユーザＡの発話の内容を表す発話テキストｕｔ（「今日の晩御飯は」「カレー」）のうちの第１のユーザ発話テキストｕｔ１（今日の晩御飯は）を埋め込み部ｅｎに入力する。そして、言語理解部１５は、埋め込み部ｅｎによりエンコード及び出力されたユーザ発話埋め込み表現ｅｂｓを取得する。 In the example shown in FIG. 4, the language understanding unit 15 inputs the first user utterance text ut1 (What's for dinner tonight?) from the utterance text ut ("What's for dinner tonight?", "Curry") representing the content of the utterance of user A, which is teacher data for learning the language model md, to the embedding unit en. Then, the language understanding unit 15 acquires the user utterance embedded expression ebs encoded and output by the embedding unit en.

ここで、言語理解部１５は、ユーザの埋め込み表現であるユーザ埋め込み表現を取得する。例えば、埋め込み表現生成システム１は、ユーザ埋め込み表現管理部２２を更に備えてもよい。ユーザ埋め込み表現管理部２２は、学習前の初期のユーザ埋め込み表現を生成及び管理してもよい。また、ユーザ埋め込み表現管理部２２は、学習過程のユーザ埋め込み表現を管理してもよい。ユーザ埋め込み表現管理部２２は、図１に示した埋め込み表現生成装置１０の機能部として構成されてもよいし、別途の装置に構成されてもよい。 Here, the language understanding unit 15 acquires user embedded expressions, which are embedded expressions of the user. For example, the embedded expression generation system 1 may further include a user embedded expression management unit 22. The user embedded expression management unit 22 may generate and manage initial user embedded expressions before learning. The user embedded expression management unit 22 may also manage user embedded expressions during the learning process. The user embedded expression management unit 22 may be configured as a functional unit of the embedded expression generation device 10 shown in FIG. 1, or may be configured in a separate device.

ユーザ埋め込み表現は、実数ベクトルにより表される。初期のユーザ埋め込み表現は、ランダムな実数ベクトルであってもよいし、ユーザに関しての何らかの特徴が反映された特徴量からなる実数ベクトルであってもよい。本実施形態の埋め込み表現生成システム１においては、初期のユーザ埋め込み表現を得る方法は限定されず、周知のいかなる手法であってもよい。 The user embedded representation is represented by a real vector. The initial user embedded representation may be a random real vector, or may be a real vector consisting of feature quantities that reflect some characteristic of the user. In the embedded representation generation system 1 of this embodiment, the method of obtaining the initial user embedded representation is not limited, and any well-known method may be used.

言語理解部１５は、ユーザ発話埋め込み表現と当該一のユーザの埋め込み表現であるユーザ埋め込み表現とを合成した合成埋め込み表現を生成する。言語理解部１５は、ユーザ発話埋め込み表現とユーザ埋め込み表現とをつなげて、合成埋め込み表現を生成してもよい。図４に示す例では、言語理解部１５は、ユーザＡのユーザ埋め込み表現ｅｂｕをユーザ埋め込み表現管理部２２から取得し、第１のユーザ発話テキストｕｔ１の埋め込み表現であるユーザ発話埋め込み表現ｅｂｓと、ユーザＡのユーザ埋め込み表現ｅｂｕとをつなげて、合成埋め込み表現ｅｂｌを生成する。そして、言語理解部１５は、合成埋め込み表現ｅｂｌを復号部ｄｅに入力することにより、復号部ｄｅにより復号（デコード）された復号テキストｄｔを取得する。 The language understanding unit 15 generates a composite embedded expression by combining the user utterance embedded expression and the user embedded expression that is the embedded expression of the one user. The language understanding unit 15 may generate the composite embedded expression by linking the user utterance embedded expression and the user embedded expression. In the example shown in FIG. 4, the language understanding unit 15 acquires the user embedded expression ebu of the user A from the user embedded expression management unit 22, and links the user utterance embedded expression ebs that is the embedded expression of the first user utterance text ut1 with the user embedded expression ebu of the user A to generate the composite embedded expression ebl. Then, the language understanding unit 15 inputs the composite embedded expression ebl to the decoding unit de to acquire the decoded text dt that has been decoded by the decoding unit de.

言語理解部１５は、発話テキストにおいて第１のユーザ発話テキストに引き続く第２のユーザ発話テキストと復号テキストとの誤差が小さくなるように言語モデル及びユーザ埋め込み表現を調整する機械学習を実施する。図４に示す例では、言語理解部１５は、発話テキストｕｔ（「今日の晩御飯は」「カレー」）のうちの第１のユーザ発話テキストｕｔ１に引き続く第２のユーザ発話テキストｕｔ２（カレー）と、復号テキストｄｔとの誤差が小さくなるように、言語モデルｍｄ及びユーザ埋め込み表現ｅｂｕを調整する。 The language understanding unit 15 performs machine learning to adjust the language model and the user-embedded expression ebu so that the error between the second user utterance text following the first user utterance text in the utterance text and the decoded text is reduced. In the example shown in FIG. 4, the language understanding unit 15 adjusts the language model md and the user-embedded expression ebu so that the error between the second user utterance text ut2 (curry) following the first user utterance text ut1 in the utterance text ut ("What's for dinner tonight?" "Curry") and the decoded text dt is reduced.

なお、言語理解部１５は、所定のポジティブな感情を表す感情情報が関連付けられた発話テキストを用いて、言語モデルｍｄ及びユーザ埋め込み表現を調整する機械学習を実施してもよい。前述のとおり、発話テキストｕｔは、当該発話テキストに係る発話が発せられたときのユーザの感情を表す感情情報を伴うことができる。かかる場合に、言語理解部１５は、例えば、「楽しい」、「穏やか」等のポジティブな感情を表す感情情報が関連付けられた発話テキストｕｔを教師データとして用いて、言語モデルｍｄ及びユーザ埋め込み表現を調整する機械学習を実施してもよい。 The language understanding unit 15 may perform machine learning to adjust the language model md and the user-embedded expression using spoken text associated with emotional information expressing a predetermined positive emotion. As described above, the spoken text ut may be accompanied by emotional information expressing the user's emotion at the time the utterance related to the spoken text was uttered. In such a case, the language understanding unit 15 may perform machine learning to adjust the language model md and the user-embedded expression using spoken text ut associated with emotional information expressing positive emotions such as "fun" and "calm" as training data.

このように、ポジティブな感情を表す感情情報が関連付けられた発話テキストが機械学習に用いられることにより、ユーザがポジティブな感情を抱いているときに発現する可能性が高い第１及び第２のユーザ発話テキストの組み合わせを教師データとすることができる。このような教師データを用いて機械学習が行われることにより、ユーザにとって話題語等との好適な関係が反映された話題埋め込み表現を生成可能な埋め込み部及びユーザ埋め込み表現が得られる。 In this way, by using speech text associated with emotional information expressing positive emotions in machine learning, a combination of the first and second user speech texts that is likely to be expressed when the user is feeling positive emotions can be used as training data. By performing machine learning using such training data, an embedding unit and user embedding expressions that can generate topic embedding expressions that reflect the user's preferred relationship with topic words, etc. can be obtained.

学習済みのニューラルネットワークを含むモデルである言語モデルｍｄは、コンピュータにより読み込まれ又は参照され、コンピュータに所定の処理を実行させ及びコンピュータに所定の機能を実現させるプログラムとして捉えることができる。 The language model md, which is a model that includes a trained neural network, can be considered as a program that is loaded or referenced by a computer and causes the computer to execute specified processes and realize specified functions.

即ち、本実施形態の学習済みの言語モデルｍｄは、ＣＰＵ及びメモリを備えるコンピュータにおいて用いられる。具体的には、コンピュータのＣＰＵが、メモリに記憶された学習済みの言語モデルｍｄからの指令に従って、ニューラルネットワークの入力層に入力された入力データに対し、例えば、各層に対応する学習済みの重み付け係数（パラメタ）及び応答関数等に基づく演算を行い、出力層から結果（確率）を出力するよう動作する。 That is, the trained language model md of this embodiment is used in a computer equipped with a CPU and memory. Specifically, the computer's CPU operates in accordance with instructions from the trained language model md stored in the memory to perform calculations on input data input to the input layer of the neural network based on, for example, trained weighting coefficients (parameters) and response functions corresponding to each layer, and to output results (probabilities) from the output layer.

再び図１を参照して、話題抽出部１６は、発話テキストから、ユーザの発話における話題を表す語句である話題語を抽出する。話題語の抽出に適用される手法は限定されず、話題抽出部１６は、例えば、形態素解析及びテキストマイニング等の周知の手法を利用することにより話題語を抽出できる。 Referring again to FIG. 1, the topic extraction unit 16 extracts topic words, which are words that express topics in the user's utterance, from the speech text. There are no limitations on the method applied to extract topic words, and the topic extraction unit 16 can extract topic words by using well-known methods such as morphological analysis and text mining, for example.

埋め込み表現取得部１７は、話題語を学習済みの埋め込み部に入力し、埋め込み部から出力される話題埋め込み表現を取得する。図５は、学習済みの言語モデルの埋め込み部を用いた埋め込み表現取得処理の例を示す図である。図５に示されるように、埋め込み表現取得部１７は、話題抽出部１６により抽出された話題語ｔｐを学習済みの埋め込み部ｅｎに入力することにより、話題埋め込み表現ｅｂｔを取得する。学習済みの埋め込み部ｅｎは、話題語の入力に応じて、話題の特徴が適切に反映された好適な話題埋め込み表現を出力することができる。 The embedded expression acquisition unit 17 inputs topic words to the trained embedding unit and acquires topic embedded expressions output from the embedding unit. FIG. 5 is a diagram showing an example of an embedded expression acquisition process using an embedding unit of a trained language model. As shown in FIG. 5, the embedded expression acquisition unit 17 inputs topic words tp extracted by the topic extraction unit 16 to the trained embedding unit en to acquire topic embedded expressions ebt. The trained embedding unit en can output suitable topic embedded expressions that appropriately reflect the characteristics of the topic in response to the input of topic words.

また、埋め込み表現取得部１７は、場所を表す場所テキストを学習済みの埋め込み部ｅｎに入力することにより、埋め込み部ｅｎから出力される場所埋め込み表現を更に取得してもよい。場所テキストは、例えば、場所の名称及び場所を説明する説明文等であってもよい。これにより、場所の特徴が好適に反映された場所埋め込み表現が得られる。 The embedded expression acquisition unit 17 may further acquire a location embedded expression output from the embedding unit en by inputting a location text representing a location to the learned embedding unit en. The location text may be, for example, the name of the location and an explanatory text explaining the location. This makes it possible to obtain a location embedded expression that appropriately reflects the characteristics of the location.

関係抽出部１８は、ユーザの発話の履歴（発話ログ）及び行動の履歴に基づいて、少なくともユーザ及び話題をノードとする関係グラフを生成する。また、関係抽出部１８は、場所を更にノードとして含む関係グラフを生成してもよい。 The relationship extraction unit 18 generates a relationship graph with at least users and topics as nodes based on the user's speech history (speech log) and behavior history. The relationship extraction unit 18 may also generate a relationship graph that further includes locations as nodes.

関係抽出部１８は、ユーザの発話及び行動等の実績に基づいて、ノード間の関係を抽出し、抽出した関係に基づいてエッジを貼る。本実施形態では、関係抽出部１８は、所定の仮想空間におけるユーザの発話の履歴及び行動の履歴に基づいて、関係グラフを生成する。 The relationship extraction unit 18 extracts relationships between nodes based on the user's speech, actions, and other records, and creates edges based on the extracted relationships. In this embodiment, the relationship extraction unit 18 generates a relationship graph based on the user's speech history and action history in a specified virtual space.

図６は、関係グラフの生成のためのエッジの取得の例を示す図である。図６に示されるように、関係抽出部１８は、例えばメタバースといった仮想空間におけるユーザの発話の履歴ｈｓ（発話ログ及び発話テキスト等）を取得する。関係抽出部１８は、ユーザの発話の履歴ｈｓから、ユーザ間の対話の実績ｒ１を抽出し、関係グラフの当該ユーザのノード間のエッジｅｄ１として割り当てる。 Figure 6 is a diagram showing an example of edge acquisition for generating a relationship graph. As shown in Figure 6, the relationship extraction unit 18 acquires a user's speech history hs (speech log, speech text, etc.) in a virtual space such as the metaverse. The relationship extraction unit 18 extracts the results r1 of dialogue between users from the user's speech history hs, and assigns it as an edge ed1 between the nodes of the users in the relationship graph.

また、関係抽出部１８は、は、ユーザの発話の履歴ｈｓから、ユーザによる話題語の発話の実績ｒ２を抽出し、当該ユーザのノードと当該話題語のノードとを接続するエッジｅｄ２として割り当てる。 The relationship extraction unit 18 also extracts the user's utterance record r2 of the topic word from the user's utterance history hs, and assigns it as an edge ed2 connecting the user's node and the topic word node.

さらに、関係抽出部１８は、仮想空間におけるユーザの行動の履歴ｈａを取得する。そして、関係抽出部１８は、ユーザの行動の履歴ｈａから、ユーザによる場所への訪問実績ｒ３を抽出し、当該ユーザのノードと当該場所のノードとを接続するエッジｅｄ３として割り当てる。 Furthermore, the relationship extraction unit 18 acquires the user's behavior history ha in the virtual space. Then, the relationship extraction unit 18 extracts the user's visit record r3 to a location from the user's behavior history ha, and assigns it as an edge ed3 connecting the node of the user and the node of the location.

関係学習部１９は、学習済みのユーザ埋め込み表現及び話題埋め込み表現の各々を関係グラフにおけるユーザ及び話題のノードの特徴量とするグラフニューラルネットワークの学習により、各ノードの学習済みの埋め込み表現を得る。 The relationship learning unit 19 obtains the learned embedded representations of each node by learning a graph neural network that treats each of the learned user embedded representations and topic embedded representations as features of the user and topic nodes in the relationship graph.

また、関係学習部１９は、場所のノードを更に含む関係グラフについて、場所埋め込み表現を場所のノードの特徴量として、関係グラフのグラフニューラルネットワークの学習により各ノードの学習済みの埋め込み表現を得てもよい。 In addition, for a relationship graph that further includes a location node, the relationship learning unit 19 may obtain a learned embedding representation for each node by learning the graph neural network of the relationship graph, using the location embedding representation as a feature of the location node.

具体的には、関係学習部１９は、言語理解部１５による機械学習により得られた学習済みのユーザ埋め込み表現ｅｂｕ、及び、埋め込み表現取得部１７により取得された話題埋め込み表現ｅｂｔを特徴量として関係グラフのユーザ及び話題の各ノードに関連付ける。また、関係学習部１９は、埋め込み表現取得部１７により取得された場所埋め込み表現を特徴量として、関係グラフの場所のノードに関連付ける。 Specifically, the relationship learning unit 19 associates the learned user embedded expressions ebu obtained by machine learning by the language understanding unit 15 and the topic embedded expressions ebt acquired by the embedded expression acquisition unit 17 as features with each user and topic node in the relationship graph. In addition, the relationship learning unit 19 associates the location embedded expressions acquired by the embedded expression acquisition unit 17 as features with the location nodes in the relationship graph.

そして、関係学習部１９は、埋め込み表現を各ノードの特徴量とする関係グラフのグラフニューラルネットワークの学習を行うことにより、各ノードの特徴量及び重みの変更し、各ノードの学習済みの埋め込み表現を得る。 Then, the relationship learning unit 19 learns the graph neural network of the relationship graph in which the embedded representation is the feature of each node, thereby changing the feature and weight of each node and obtaining the learned embedded representation of each node.

関係学習部１９は、周知のグラフニューラルネットワークの学習の手法により関係グラフの学習を実施できる。図７を参照しながら、関係グラフの学習について概略的に説明する。図７は、関係グラフの一例及び関係グラフからの正例及び負例の抽出の例を示す図である。 The relationship learning unit 19 can learn the relationship graph using a well-known graph neural network learning method. The learning of the relationship graph will be generally described with reference to FIG. 7. FIG. 7 is a diagram showing an example of a relationship graph and an example of extracting positive examples and negative examples from the relationship graph.

図７に例示される関係グラフｇｎは、ユーザ、話題及び場所のいずれかに対応するノードｎ１～ｎ５を含む。関係学習部１９は、着目ノードをランダムにサンプリングする。図７に示される例では、ノードｎ２が着目ノードしてサンプリングされたとする。 The relationship graph gn illustrated in FIG. 7 includes nodes n1 to n5 that correspond to users, topics, and locations. The relationship learning unit 19 randomly samples a node of interest. In the example illustrated in FIG. 7, it is assumed that node n2 is sampled as the node of interest.

関係学習部１９は、関係グラフｇｎから正例グラフｇ１及び負例グラフｇ２を抽出する。正例グラフｇ１は、着目ノードであるノードｎ２、及び、ノードｎ２とエッジで接続されたノードｎ１，ｎ５を含む。負例グラフｇ２は、着目ノードであるノードｎ２、及び、ノードｎ２とエッジで接続されていないノードｎ３，ｎ４を含む。なお、負例グラフｇ２は、着目ノードとエッジで接続されていないノードの全てを含むことを要さない。 The relationship learning unit 19 extracts a positive example graph g1 and a negative example graph g2 from the relationship graph gn. The positive example graph g1 includes node n2, which is the node of interest, and nodes n1 and n5 that are connected to node n2 by edges. The negative example graph g2 includes node n2, which is the node of interest, and nodes n3 and n4 that are not connected to node n2 by edges. Note that the negative example graph g2 does not need to include all nodes that are not connected to the node of interest by edges.

以下、関係グラフｇｎの学習の一例を説明するが、グラフニューラルネットワークの学習処理は周知の技術であるので、簡略的に説明する。 Below, we will explain an example of learning the relationship graph gn, but since the learning process for graph neural networks is a well-known technique, we will explain it simply.

まず、正例グラフｇ１における学習について説明する。関係学習部１９は、正例グラフｇ１に基づいて、グラフに含まれるノードを行及び列とし、着目ノードであるノードｎ２とのエッジによる接続関係を要素として表現した隣接行列Ａを抽出する。 First, learning in the positive example graph g1 will be described. Based on the positive example graph g1, the relationship learning unit 19 extracts an adjacency matrix A in which the nodes included in the graph are represented as rows and columns, and the connection relationships via edges with the node of interest, node n2, are represented as elements.

また、関係学習部１９は、グラフに含まれるノードを行及び列とし、ノードの自己ループを要素として表現した対角行列Ｉを抽出する。そして、ノードの特徴量を表す実数ベクトルをノード特徴量Ｘとすると、各ノードの特徴量が、隣接行列Ａにより表現された接続関係のあるノードの特徴量と、対角行列Ｉにより表現された自ノードの特徴量との合計（畳み込み）として、以下の式により表される。
（Ａ＋Ｉ）・Ｘ Furthermore, the relationship learning unit 19 extracts a diagonal matrix I in which the nodes included in the graph are represented as rows and columns, and the self-loops of the nodes are represented as elements. If a real vector representing the feature of a node is represented as node feature X, the feature of each node is represented by the following formula as the sum (convolution) of the feature of the connected node represented by the adjacency matrix A and the feature of the node itself represented by the diagonal matrix I.
(A+I)・X

関係学習部１９は、以下の式により表されるように、畳み込まれた各ノードの特徴量に、重みＷをかけ、さらに活性化関数ｆに入力して出力Ｈを得る。
Ｈ（正例）＝ｆ（（Ａ＋Ｉ）・Ｘ・Ｗ）
そして、関係学習部１９は、正例グラフｇ１に基づいて得られた出力Ｈ（正例）が１となるように、重み及び特徴量を学習する。 The relation learning unit 19 multiplies the feature amount of each convoluted node by a weight W, as expressed by the following equation, and further inputs the result to an activation function f to obtain an output H.
H (positive example) = f ((A+I)・X・W)
Then, the relationship learning unit 19 learns the weights and features so that the output H (positive example) obtained based on the positive example graph g1 becomes 1.

関係学習部１９は、負例グラフｇ２に基づいて、同様に、出力Ｈ（負例）を得る。そして、関係学習部１９は、負例グラフｇ２に基づいて得られた出力Ｈ（負例）が０となるように、重み及び特徴量を学習する。 The relationship learning unit 19 similarly obtains an output H (negative example) based on the negative example graph g2. Then, the relationship learning unit 19 learns the weights and features so that the output H (negative example) obtained based on the negative example graph g2 becomes 0.

再び図１を参照して、埋め込み表現出力部２０は、関係学習部１９による学習を経た各ノードの埋め込み表現を出力する。図８は、関係グラフを構成するグラフニューラルネットワークの学習により得られる各エンティティの埋め込み表現の例を示す図である。図８に示されるように、埋め込み表現出力部２０は、関係学習部１９による関係グラフｇｎを対象とするグラフニューラルネットワークの学習ｇｍにより、関係グラフｇｎの各ノードに対応するエンティティ１，２，３，４，５，・・の埋め込み表現ＥＢを出力する。 Referring again to FIG. 1, the embedded representation output unit 20 outputs the embedded representation of each node that has been learned by the relationship learning unit 19. FIG. 8 is a diagram showing an example of an embedded representation of each entity obtained by learning the graph neural network that configures the relationship graph. As shown in FIG. 8, the embedded representation output unit 20 outputs embedded representations EB of entities 1, 2, 3, 4, 5, ... corresponding to each node of the relationship graph gn, based on the learning gm of the graph neural network that targets the relationship graph gn by the relationship learning unit 19.

このように得られた各ノードの埋め込み表現は、各ノードに対応する各エンティティの特徴が好適に反映されていると共に、エンティティ間の関係が反映された実数ベクトルであるので、エンティティ間の距離を計算することが可能である。
従って、関係グラフにおける各ノードは、ユーザ、話題及び場所等の異なる種別のエンティティに対応するところ、異なる種別のエンティティ間の距離を計算することが可能となる。 The embedded representation of each node obtained in this way appropriately reflects the characteristics of each entity corresponding to each node, and is a real vector that reflects the relationships between the entities, making it possible to calculate the distance between the entities.
Thus, where each node in the relationship graph corresponds to a different type of entity, such as a user, topic, and place, it becomes possible to calculate the distance between entities of different types.

なお、埋め込み表現出力部２０による埋め込み表現の出力の態様は限定されず、所定の記憶手段による記憶、所定の装置への送信、所定の表示装置への表示等であってもよい。 The manner in which the embedded expression is output by the embedded expression output unit 20 is not limited, and may be storage in a specified storage means, transmission to a specified device, display on a specified display device, etc.

再び図１を参照して、リンク予測部２１は、学習済みの各ノードの埋め込み表現に基づいてノード間の距離を算出し、算出されたノード間の距離に基づいて、各ノード間にエッジが貼られる可能性を示すリンク予測情報を算出する。 Referring again to FIG. 1, the link prediction unit 21 calculates the distance between nodes based on the learned embedded representation of each node, and calculates link prediction information indicating the possibility of an edge being established between each node based on the calculated distance between the nodes.

具体的には、リンク予測部２１は、例えば、実数ベクトル間の距離として算出したノード間の距離が、所与の閾値以下であるか否かを判定する。そして、リンク予測部２１は、ノード間の距離が閾値以下であると判定した場合に、当該ノード間にエッジが存在すると予測する旨を示すリンク予測情報を出力する。 Specifically, the link prediction unit 21 determines whether the distance between nodes, calculated as the distance between real vectors, is equal to or less than a given threshold. If the link prediction unit 21 determines that the distance between nodes is equal to or less than the threshold, it outputs link prediction information indicating that an edge is predicted to exist between the nodes.

このように、関係グラフｇｎに関するグラフニューラルネットワークの学習ｇｍにより、異なる種別のエンティティ間の距離が計算可能な、実数ベクトルにより表現される埋め込み表現が得られるので、グラフの各ノード間にエッジが張られる可能性の評価が可能なリンク予測情報が算出される。従って、各ノードに対応するエンティティ間に一定程度以上の関係があることの予測が可能となる。 In this way, by learning gm of the graph neural network on the relationship graph gn, an embedded representation expressed by a real vector is obtained that allows the distance between different types of entities to be calculated, and link prediction information is calculated that allows the evaluation of the possibility that an edge will be established between each node of the graph. Therefore, it becomes possible to predict that there is a certain degree of relationship between the entities corresponding to each node.

また、リンク予測部２１は、ノード間の距離に関する所与の閾値に基づいて、ノード間の距離が閾値以下である各ノードを示す情報をリンク予測情報として出力する。 In addition, based on a given threshold value for the distance between nodes, the link prediction unit 21 outputs information indicating each node whose distance between nodes is equal to or less than the threshold value as link prediction information.

具体的には、リンク予測部２１は、例えば、実数ベクトル間の距離として算出したノード間の距離が、所与の閾値以下であるか否かを判定し、距離が閾値以下であると判定されたノードに対応するエンティティを示す情報をリンク予測情報として出力する。距離が閾値以下であると判定されたノードに対応するエンティティの少なくとも一方がユーザである場合には、当該ユーザに、他方のエンティティを示す情報を、レコメンド情報として提供してもよい。 Specifically, the link prediction unit 21 determines whether the distance between nodes, calculated as the distance between real vectors, is equal to or less than a given threshold, and outputs information indicating an entity corresponding to a node whose distance is determined to be equal to or less than the threshold as link prediction information. If at least one of the entities corresponding to a node whose distance is determined to be equal to or less than the threshold is a user, information indicating the other entity may be provided to the user as recommendation information.

図９は、埋め込み表現生成装置１０における埋め込み表現生成方法の処理内容を示すフローチャートである。 Figure 9 is a flowchart showing the processing steps of the embedded expression generation method in the embedded expression generation device 10.

ステップＳ１において、テキスト取得部１３は、発話ログに基づきユーザの発話の内容を表すテキストである発話テキストを取得する。 In step S1, the text acquisition unit 13 acquires speech text, which is text that represents the content of the user's utterance, based on the speech log.

ステップＳ２において、言語理解部１５は、エンコーダデコーダモデルにより構成される言語モデルの機械学習を実施する。ステップＳ２の処理内容を、図１０を参照して説明する。 In step S2, the language understanding unit 15 performs machine learning of a language model configured by an encoder-decoder model. The processing content of step S2 will be described with reference to FIG. 10.

図１０は、言語モデルの機械学習の処理内容を示すフローチャートである。ステップＳ２１において、言語理解部１５は、発話テキストのうちの、一のユーザの発話内容を表す第１のユーザ発話テキストを埋め込み部ｅｎに入力する。 Figure 10 is a flowchart showing the process of machine learning for a language model. In step S21, the language understanding unit 15 inputs a first user utterance text representing the utterance content of one user from among the utterance texts to the embedding unit en.

ステップＳ２２において、言語理解部１５は、埋め込み部ｅｎによりエンコード及び出力されたユーザ発話埋め込み表現ｅｂｓを取得する。 In step S22, the language understanding unit 15 acquires the user utterance embedded expression ebs encoded and output by the embedding unit en.

ステップＳ２３において、言語理解部１５は、ユーザ発話埋め込み表現と当該一のユーザの埋め込み表現であるユーザ埋め込み表現とを合成した合成埋め込み表現ｅｂｌ」を生成する。そして、言語理解部１５は、合成埋め込み表現ｅｂｌを復号部ｄｅに入力する。 In step S23, the language understanding unit 15 generates a composite embedded representation ebl by combining the user utterance embedded representation and the user embedded representation, which is the embedded representation of the particular user. Then, the language understanding unit 15 inputs the composite embedded representation ebl to the decoding unit de.

ステップＳ２４において、言語理解部１５は、復号部ｄｅにより復号（デコード）された復号テキストｄｔを取得する。 In step S24, the language understanding unit 15 obtains the decoded text dt decoded by the decoding unit de.

ステップＳ２５において、言語理解部１５は、発話テキストにおいて第１のユーザ発話テキストに引き続く第２のユーザ発話テキストと復号テキストとの誤差が小さくなるように言語モデル及びユーザ埋め込み表現を調整する機械学習を実施する。 In step S25, the language understanding unit 15 performs machine learning to adjust the language model and the user-embedded expressions so as to reduce the error between the second user-uttered text that follows the first user-uttered text in the utterance text and the decoded text.

ステップＳ２６において、言語理解部１５は、言語モデルの機械学習を終了するか否かを判定する。言語モデルの機械学習を終了すると判定された場合には、処理はステップＳ２７に進む。一方、言語モデルの機械学習を終了すると判定されなかった場合には、教師データとしての発話テキスト（第１及び第２のユーザ発話テキスト）を用いて、ステップＳ２１～Ｓ２５の処理が繰り返される。 In step S26, the language understanding unit 15 determines whether or not to end machine learning of the language model. If it is determined that machine learning of the language model is to end, the process proceeds to step S27. On the other hand, if it is not determined that machine learning of the language model is to end, the processes of steps S21 to S25 are repeated using the spoken text (the first and second user spoken text) as training data.

ステップＳ２７において、言語理解部１５は、学習済みの言語モデル及びユーザ埋め込み表現を出力する。言語理解部１５は、例えば、学習済みの言語モデルを所定の記憶手段に記憶させてもよい。また、言語理解部１５は、学習済みのユーザ埋め込み表現を、所定の記憶手段に記憶させてもよいし、ユーザ埋め込み表現管理部２２に管理させてもよい。 In step S27, the language understanding unit 15 outputs the trained language model and the user-embedded expressions. For example, the language understanding unit 15 may store the trained language model in a predetermined storage means. The language understanding unit 15 may also store the trained user-embedded expressions in a predetermined storage means, or may have the user-embedded expressions managed by the user-embedded expression management unit 22.

再び図９を参照して、ステップＳ３において、話題抽出部１６は、発話テキストから、ユーザの発話における話題を表す語句である話題語を抽出する。 Referring again to FIG. 9, in step S3, the topic extraction unit 16 extracts topic words, which are words that represent topics in the user's utterance, from the speech text.

ステップＳ４において、埋め込み表現取得部１７は、話題語を学習済みの埋め込み部ｅｎに入力し、埋め込み部ｅｎから出力される話題埋め込み表現を取得する。ここで、埋め込み表現取得部１７は、場所を表す場所テキストを学習済みの埋め込み部ｅｎに入力することにより、埋め込み部ｅｎから出力される場所埋め込み表現を更に取得してもよい。 In step S4, the embedded expression acquisition unit 17 inputs the topic word to the learned embedding unit en, and acquires the topic embedded expression output from the embedding unit en. Here, the embedded expression acquisition unit 17 may further acquire the location embedded expression output from the embedding unit en by inputting a location text representing a location to the learned embedding unit en.

ステップＳ５において、関係抽出部１８は、ユーザの発話の履歴（発話ログ）及び行動の履歴に基づいて、少なくともユーザ及び話題をノードとする関係グラフを生成する。また、関係抽出部１８は、場所を更にノードとして含む関係グラフを生成してもよい。 In step S5, the relationship extraction unit 18 generates a relationship graph in which at least users and topics are nodes based on the user's speech history (speech log) and behavior history. The relationship extraction unit 18 may also generate a relationship graph that further includes locations as nodes.

ステップＳ６において、関係学習部１９は、学習済みのユーザ埋め込み表現及び話題埋め込み表現の各々を関係グラフにおけるユーザ及び話題のノードの特徴量とするグラフニューラルネットワークの学習を実施する。学習に供される関係グラフは、場所をノードとして更に含み、場所埋め込み表現が場所のノードの特徴量をされてもよい。 In step S6, the relationship learning unit 19 performs learning of a graph neural network in which each of the learned user-embedded expressions and topic-embedded expressions is treated as a feature of the user and topic nodes in the relationship graph. The relationship graph used for learning may further include locations as nodes, and the location-embedded expressions may be treated as a feature of the location nodes.

ステップＳ７において、関係学習部１９は、埋め込み表現を各ノードの特徴量とする関係グラフのグラフニューラルネットワークの学習を行うことにより、各ノードの特徴量及び重みの変更し、各ノードの学習済みの埋め込み表現を得る。 In step S7, the relationship learning unit 19 learns the graph neural network of the relationship graph in which the embedded representation is the feature of each node, thereby changing the feature and weight of each node and obtaining the learned embedded representation of each node.

ステップＳ８において、埋め込み表現出力部２０は、関係学習部１９による学習を経た各ノードの埋め込み表現を出力する。 In step S8, the embedded representation output unit 20 outputs the embedded representation of each node that has been learned by the relation learning unit 19.

次に、図１１を参照して、コンピュータを、本実施形態の埋め込み表現生成装置１０として機能させるための埋め込み表現生成プログラムについて説明する。図１１は、埋め込み表現生成プログラムの構成を示す図である。埋め込み表現生成プログラムＰ１は、埋め込み表現生成装置１０における埋め込み表現生成処理を統括的に制御するメインモジュールｍ１０、発話ログ取得モジュールｍ１１、音声認識モジュールｍ１２、テキスト取得モジュールｍ１３、感情取得モジュールｍ１４、言語理解モジュールｍ１５、話題抽出モジュールｍ１６、埋め込み表現取得モジュールｍ１７、関係抽出モジュールｍ１８、関係学習モジュールｍ１９、埋め込み表現出力モジュールｍ２０及びリンク予測モジュールｍ２１を備えて構成される。そして、各モジュールｍ１１～ｍ２１のそれぞれにより、各機能部１１～２１のための各機能が実現される。 Next, referring to FIG. 11, an embedded expression generation program for causing a computer to function as the embedded expression generation device 10 of this embodiment will be described. FIG. 11 is a diagram showing the configuration of the embedded expression generation program. The embedded expression generation program P1 is configured to include a main module m10 that controls the embedded expression generation process in the embedded expression generation device 10 in an overall manner, an utterance log acquisition module m11, a voice recognition module m12, a text acquisition module m13, an emotion acquisition module m14, a language understanding module m15, a topic extraction module m16, an embedded expression acquisition module m17, a relationship extraction module m18, a relationship learning module m19, an embedded expression output module m20, and a link prediction module m21. Each of the modules m11 to m21 realizes a function for each of the functional units 11 to 21.

なお、埋め込み表現生成プログラムＰ１は、通信回線等の伝送媒体を介して伝送される態様であってもよいし、図１１に示されるように、記録媒体Ｍ１に記憶される態様であってもよい。 The embedded expression generation program P1 may be transmitted via a transmission medium such as a communication line, or may be stored in a recording medium M1 as shown in FIG. 11.

以上説明した本実施形態の埋め込み表現生成装置１０、埋め込み表現生成方法、埋め込み表現生成プログラムＰ１によれば、エンコーダデコーダモデルにより構成される言語モデルが、第１のユーザ発話テキスト及び第２のユーザ発話テキストのペアを教師データとして、第１のユーザ発話テキストを埋め込み部に入力して得られたユーザ発話埋め込み表現とユーザ埋め込み表現とを合成した合成埋め込み表現を復号部に入力し、復号部から出力された復号テキストと第２のユーザ発話テキストとの誤差が小さくなるように言語モデル及びユーザ埋め込み表現が機械学習されることにより、話題語の入力に応じて好適な話題埋め込み表現を出力する埋め込み部（エンコーダ）が得られると共に、ユーザの特徴が好適に反映されたユーザ埋め込み表現が得られる。そして、ユーザ及び話題をノードとし、ユーザの発話及び行動の履歴に基づいてノード間にエッジが張られた関係グラフが生成され、話題語を埋め込み部に入力することにより得られる話題埋め込み表現及び学習済みのユーザ埋め込み表現の各々を話題語及びユーザの特徴量とするグラフニューラルネットワークの学習により、話題語及びユーザの特徴が好適に反映された、学習済みの話題埋め込み表現及びユーザ埋め込み表現が得られる。得られた話題埋め込み表現及びユーザ埋め込み表現には、それらのエンティティ間の関係が反映されているので、ユーザと話題との間の距離を計算することが可能である。 According to the above-described embodiment of the embedded expression generation device 10, the embedded expression generation method, and the embedded expression generation program P1, the language model configured by the encoder-decoder model uses a pair of a first user utterance text and a second user utterance text as teacher data, inputs the first user utterance text to the embedding unit, and inputs a composite embedded expression obtained by combining the user utterance embedded expression and the user embedded expression into the decoding unit. The language model and the user embedded expression are machine-learned so that the error between the decoded text output from the decoding unit and the second user utterance text is reduced, thereby obtaining an embedding unit (encoder) that outputs a suitable topic embedded expression in response to the input of a topic word, and obtaining a user embedded expression that suitably reflects the user's characteristics. Then, a relationship graph is generated in which the user and topic are nodes and edges are drawn between the nodes based on the user's utterance and behavior history, and a graph neural network is trained in which the topic embedded expression obtained by inputting the topic word into the embedding unit and the learned user embedded expression are each used as the feature of the topic word and the user, thereby obtaining a learned topic embedded expression and a user embedded expression that suitably reflect the topic word and the user's characteristics. The resulting topic embeddings and user embeddings reflect the relationships between those entities, making it possible to calculate the distance between the user and the topic.

本開示に係る発明は、例えば、以下のように把握される。 The invention disclosed herein can be understood, for example, as follows:

本開示の第１の一側面に係る埋め込み表現生成システムは、少なくともユーザ及び話題の埋め込み表現を生成する埋め込み表現生成システムであって、埋め込み部及び復号部を含むエンコーダデコーダモデルにより構成される言語モデルを学習する言語理解部であって、埋め込み部は、入力されたテキストの特徴を表す埋め込み表現を出力し、復号部は、埋め込み部からの出力を少なくとも含む埋め込み表現を復号し、ユーザの発話の内容を表す発話テキストのうちの、一のユーザの発話内容を表す第１のユーザ発話テキストを埋め込み部に入力することにより埋め込み部から出力されたユーザ発話埋め込み表現を取得し、ユーザ発話埋め込み表現と当該一のユーザの埋め込み表現であるユーザ埋め込み表現とを合成した合成埋め込み表現を復号部に入力することにより復号部から出力された復号テキストを取得し、発話テキストにおいて第１のユーザ発話テキストに引き続く第２のユーザ発話テキストと復号テキストとの誤差が小さくなるように言語モデル及びユーザ埋め込み表現を調整する機械学習を実施し、ユーザ埋め込み表現は、学習前の初期のユーザ埋め込み表現又は学習過程のユーザ埋め込み表現である、言語理解部と、発話テキストから、ユーザの発話における話題を表す語句である話題語を抽出する話題抽出部と、話題語を学習済みの埋め込み部に入力し、埋め込み部から出力される話題埋め込み表現を取得する埋め込み表現取得部と、ユーザの発話の履歴及び行動の履歴に基づいて、少なくともユーザ及び話題をノードとし、ユーザ間の対話の実績をユーザ間を接続するエッジとし、ユーザの話題語の発話の実績を当該ユーザと話題とを接続するエッジとするグラフである関係グラフを生成する関係抽出部と、学習済みのユーザ埋め込み表現及び話題埋め込み表現の各々を関係グラフにおけるユーザ及び話題のノードの特徴量とするグラフニューラルネットワークの学習により、各ノードの学習済みの埋め込み表現を得る関係学習部と、各ノードの埋め込み表現を出力する埋め込み表現出力部と、を備える。 An embedded expression generation system according to a first aspect of the present disclosure is an embedded expression generation system that generates embedded expressions of at least a user and a topic, and a language understanding unit that learns a language model composed of an encoder-decoder model including an embedding unit and a decoding unit, in which the embedding unit outputs embedded expressions that represent the characteristics of the input text, the decoding unit decodes the embedded expressions that include at least the output from the embedding unit, obtains a user utterance embedded expression output from the embedding unit by inputting a first user utterance text that represents the utterance content of one user out of the utterance text that represents the content of the user's utterance to the embedding unit, obtains a decoded text output from the decoding unit by inputting a composite embedded expression that combines the user utterance embedded expression and the user embedded expression that is the embedded expression of the one user to the decoding unit, and adjusts the language model and the user utterance so that an error between the second user utterance text that follows the first user utterance text in the utterance text and the decoded text is reduced. The system includes a language understanding unit that performs machine learning to adjust user embedded expressions, the user embedded expressions being initial user embedded expressions before learning or user embedded expressions in the learning process, a topic extraction unit that extracts topic words, which are phrases that express topics in the user's utterance, from the spoken text, an embedded expression acquisition unit that inputs the topic words to a learned embedding unit and acquires topic embedded expressions output from the embedding unit, a relationship extraction unit that generates a relationship graph based on the user's utterance history and behavior history, in which at least users and topics are nodes, conversation records between users are edges connecting users, and user utterance records of topic words are edges connecting users and topics, a relationship learning unit that obtains learned embedded expressions for each node by learning a graph neural network in which each of the learned user embedded expressions and topic embedded expressions is a feature of the user and topic nodes in the relationship graph, and an embedded expression output unit that outputs the embedded expression for each node.

第２の側面に係る埋め込み表現生成システムでは、第１の側面に係る埋め込み表現生成システムにおいて、ユーザの発話が発せられた時の該ユーザの感情を表す感情情報を発話の音声又は該ユーザの表情に基づいて取得し、取得された感情情報を当該発話の内容を表す発話テキストに関連付ける感情取得部、をさらに備え、言語理解部は、所定のポジティブな感情を表す感情情報が関連付けられた発話テキストを用いて、言語モデル及びユーザ埋め込み表現を調整する機械学習を実施することとしてもよい。 The embedded expression generation system according to the second aspect may further include an emotion acquisition unit that acquires emotion information representing the emotion of the user when the user uttered an utterance based on the voice of the utterance or the facial expression of the user, and associates the acquired emotion information with a speech text representing the content of the utterance, in the embedded expression generation system according to the first aspect, and the language understanding unit may perform machine learning to adjust the language model and the user embedded expression using the speech text associated with emotion information representing a predetermined positive emotion.

上記の側面によれば、ポジティブな感情を抱いている可能性が高い時にユーザが発した発話を表す発話テキストが機械学習に用いられる。従って、教師データを構成する第１及び第２のユーザ発話テキストの組み合わせは、ユーザがポジティブな感情を抱いているときに発現する可能性が高い組合せである。このような教師データを用いて機械学習が行われることにより、ユーザにとって話題語との好適な関係が反映された話題埋め込み表現を生成可能な埋め込み部及びユーザ埋め込み表現が得られる。 According to the above aspect, speech texts that represent utterances made by a user when the user is likely to have positive emotions are used for machine learning. Therefore, the combination of the first and second user speech texts that constitute the training data is a combination that is likely to occur when the user is having positive emotions. By performing machine learning using such training data, an embedding unit and user embedding expressions that can generate topic embedding expressions that reflect the user's preferred relationship with topic words are obtained.

第３の側面に係る埋め込み表現生成システムでは、第１または２の側面に係る埋め込み表現生成システムにおいて、埋め込み表現取得部は、場所を表す場所テキストを学習済みの埋め込み部に入力することにより、埋め込み部から出力される場所埋め込み表現を更に取得し、関係抽出部は、ユーザの発話の履歴及び行動の履歴に基づいて、少なくともユーザ、話題及び場所をノードとし、ユーザ間の対話の実績をユーザ間を接続するエッジとし、ユーザの話題語の発話の実績を当該ユーザと話題とを接続するエッジとし、ユーザの場所への訪問の実績を当該ユーザと場所とを接続するエッジとするグラフである関係グラフを生成し、関係学習部は、学習済みのユーザ埋め込み表現、話題埋め込み表現及び場所埋め込み表現の各々を関係グラフにおけるユーザ、話題及び場所のノードの特徴量とするグラフニューラルネットワークの学習により、各ノードの学習済みの埋め込み表現を得ることとしてもよい。 In the embedded expression generation system according to the third aspect, in the embedded expression generation system according to the first or second aspect, the embedded expression acquisition unit further acquires a location embedded expression output from the embedding unit by inputting a location text representing a location to the learned embedding unit, and the relationship extraction unit generates a relationship graph based on the user's speech history and behavior history, in which at least users, topics, and locations are nodes, records of conversations between users are edges connecting users, records of utterances of topic words by users are edges connecting the users and topics, and records of visits to locations by users are edges connecting the users and locations, and the relationship learning unit may obtain a learned embedded expression for each node by learning a graph neural network in which each of the learned user embedded expressions, topic embedded expressions, and location embedded expressions is a feature of the user, topic, and location nodes in the relationship graph.

上記の側面によれば、学習済みの言語モデルの埋め込み部に場所テキストを入力することにより、場所の特徴が好適に反映された場所埋め込み表現が得られる。そして、ユーザ、話題及び場所をノードとし、ユーザの発話及び行動の履歴に基づいてノード間にエッジが張られた関係グラフが生成され、話題埋め込み表現、場所埋め込み表現及び学習済みのユーザ埋め込み表現の各々を話題語、場所及びユーザの特徴量とするグラフニューラルネットワークの学習により、話題語、場所及びユーザの特徴が好適に反映された、学習済みの話題埋め込み表現、場所埋め込み表現及びユーザ埋め込み表現が得られる。得られた話題埋め込み表現、場所埋め込み表現及びユーザ埋め込み表現には、それらのエンティティ間の関係が反映されているので、ユーザと話題及び場所との間の距離を計算することが可能である。 According to the above aspect, by inputting location text into the embedding section of the trained language model, a location embedding representation that appropriately reflects the features of the location is obtained. Then, a relationship graph is generated in which the nodes are users, topics, and locations, and edges are drawn between the nodes based on the user's utterance and behavior history. A graph neural network is trained in which the topic embedding representation, the location embedding representation, and the trained user embedding representation are the features of the topic word, the location, and the user, respectively, to obtain trained topic embedding representations, location embedding representations, and user embedding representations that appropriately reflect the features of the topic word, the location, and the user. The obtained topic embedding representations, location embedding representations, and user embedding representations reflect the relationships between those entities, so it is possible to calculate the distance between the user and the topic and the location.

第４の側面に係る埋め込み表現生成システムでは、第１～３の側面のいずれか一つの側面に係る埋め込み表現生成システムにおいて、学習済みの各ノードの埋め込み表現に基づいてノード間の距離を算出し、算出されたノード間の距離に基づいて、各ノード間にエッジが貼られる可能性を示すリンク予測情報を算出する、リンク予測部を更に備えることとしてもよい。 In the embedded expression generation system according to the fourth aspect, the embedded expression generation system according to any one of the first to third aspects may further include a link prediction unit that calculates the distance between nodes based on the learned embedded expression of each node, and calculates link prediction information indicating the possibility of an edge being established between each node based on the calculated distance between the nodes.

上記の側面によれば、関係グラフに関するグラフニューラルネットワークの学習により、異なる種別のエンティティ間の距離が計算可能な、実数ベクトルにより表現される埋め込み表現が得られるので、グラフの各ノード間にエッジが張られる可能性の評価が可能なリンク予測情報が算出される。従って、各ノードに対応するエンティティ間に一定程度以上の関係があることの予測が可能となる。 According to the above aspect, by learning a graph neural network about a relationship graph, an embedded representation expressed by a real number vector is obtained, which allows the distance between different types of entities to be calculated, and link prediction information is calculated that allows the evaluation of the possibility that an edge will be established between each node of the graph. Therefore, it becomes possible to predict that there is a certain degree of relationship between the entities corresponding to each node.

第５の側面に係る埋め込み表現生成システムでは、第４の側面に係る埋め込み表現生成システムにおいて、リンク予測部は、ノード間の距離に関する所与の閾値に基づいて、ノード間の距離が閾値以下である各ノードを示す情報をリンク予測情報として出力することとしてもよい。 In the embedded expression generation system according to the fifth aspect, the link prediction unit in the embedded expression generation system according to the fourth aspect may output, as link prediction information, information indicating each node whose inter-node distance is equal to or less than a threshold based on a given threshold for the inter-node distance.

上記の側面によれば、ノード間の距離が所与の閾値以下であるノードを示す情報に基づいて、所定の程度以上の関係があるエンティティに関する情報を得ることが可能となる。 According to the above aspect, it is possible to obtain information about entities that have a relationship with a predetermined degree or more based on information indicating nodes whose internode distance is equal to or less than a given threshold.

第６の側面に係る埋め込み表現生成システムでは、第１～５の側面いずれか一つの側面に係る埋め込み表現生成システムにおいて、発話テキストは、所定の仮想空間におけるユーザの発話の内容を表す音声又はテキストの発話ログに基づいて取得されることとしてもよい。 In the embedded expression generation system according to the sixth aspect, in the embedded expression generation system according to any one of the first to fifth aspects, the spoken text may be obtained based on a speech log of audio or text representing the content of a user's utterance in a specified virtual space.

上記の側面によれば、仮想空間においては、ユーザの発話を表す音声又はテキストを容易に取得できるので、発話テキストの取得が容易となる。 According to the above aspect, voice or text representing a user's speech can be easily acquired in a virtual space, making it easy to acquire speech text.

第７の側面に係る埋め込み表現生成システムでは、第１～６の側面いずれか一つの側面に係る埋め込み表現生成システムにおいて、関係抽出部は、所定の仮想空間におけるユーザの発話の履歴及び行動の履歴に基づいて、関係グラフを生成することとしてもよい。 In the embedded expression generation system according to the seventh aspect, which is an embedded expression generation system according to any one of the first to sixth aspects, the relationship extraction unit may generate a relationship graph based on a user's speech history and behavior history in a specified virtual space.

上記の側面によれば、仮想空間においては、ユーザの発話の履歴及び行動の履歴を容易に取得できるので、関係グラフを容易に生成できる。 According to the above aspect, since a user's speech history and behavior history can be easily acquired in a virtual space, a relationship graph can be easily generated.

以上、本開示について詳細に説明したが、当業者にとっては、本開示が本開示中に説明した実施形態に限定されるものではないということは明らかである。本開示は、請求の範囲の記載により定まる本開示の趣旨及び範囲を逸脱することなく修正及び変更態様として実施することができる。したがって、本開示の記載は、例示説明を目的とするものであり、本開示に対して何ら制限的な意味を有するものではない。 Although the present disclosure has been described in detail above, it is clear to those skilled in the art that the present disclosure is not limited to the embodiments described herein. The present disclosure can be implemented in modified and altered forms without departing from the spirit and scope of the present disclosure as defined by the claims. Therefore, the description of the present disclosure is intended as an illustrative example and does not have any limiting meaning on the present disclosure.

情報の通知は、本開示において説明した態様／実施形態に限られず、他の方法を用いて行われてもよい。例えば、情報の通知は、物理レイヤシグナリング（例えば、ＤＣＩ（Downlink Control Information）、ＵＣＩ（Uplink Control Information））、上位レイヤシグナリング（例えば、ＲＲＣ（Radio Resource Control）シグナリング、ＭＡＣ（Medium Access Control）シグナリング、報知情報（ＭＩＢ（Master Information Block）、ＳＩＢ（System Information Block）））、その他の信号又はこれらの組み合わせによって実施されてもよい。また、ＲＲＣシグナリングは、ＲＲＣメッセージと呼ばれてもよく、例えば、ＲＲＣ接続セットアップ（RRC Connection Setup）メッセージ、ＲＲＣ接続再構成（RRC Connection Reconfiguration）メッセージなどであってもよい。 The notification of information is not limited to the aspects/embodiments described in the present disclosure, and may be performed using other methods. For example, the notification of information may be performed by physical layer signaling (e.g., Downlink Control Information (DCI), Uplink Control Information (UCI)), higher layer signaling (e.g., Radio Resource Control (RRC) signaling, Medium Access Control (MAC) signaling, broadcast information (Master Information Block (MIB), System Information Block (SIB))), other signals, or a combination of these. In addition, the RRC signaling may be called an RRC message, and may be, for example, an RRC Connection Setup message, an RRC Connection Reconfiguration message, etc.

本明細書で説明した各態様／実施形態は、ＬＴＥ（Long Term Evolution）、ＬＴＥ－Ａ（LTE-Advanced）、ＳＵＰＥＲ３Ｇ、ＩＭＴ－Ａｄｖａｎｃｅｄ、４Ｇ、５Ｇ、ＦＲＡ（Future Radio Access）、Ｗ－ＣＤＭＡ（登録商標）、ＧＳＭ（登録商標）、ＣＤＭＡ２０００、ＵＭＢ（Ultra Mobile Broadband）、ＩＥＥＥ８０２．１１（Ｗｉ－Ｆｉ）、ＩＥＥＥ８０２．１６（ＷｉＭＡＸ）、ＩＥＥＥ８０２．２０、ＵＷＢ（Ultra-WideBand）、Ｂｌｕｅｔｏｏｔｈ（登録商標）、その他の適切なシステムを利用するシステム及び／又はこれらに基づいて拡張された次世代システムに適用されてもよい。また、複数のシステムが組み合わされて（例えば、ＬＴＥ及びＬＴＥ－Ａの少なくとも一方と５Ｇとの組み合わせ等）適用されてもよい。 Each aspect/embodiment described herein may be applied to systems using LTE (Long Term Evolution), LTE-Advanced (LTE-A), SUPER 3G, IMT-Advanced, 4G, 5G, FRA (Future Radio Access), W-CDMA (registered trademark), GSM (registered trademark), CDMA2000, UMB (Ultra Mobile Broadband), IEEE 802.11 (Wi-Fi), IEEE 802.16 (WiMAX), IEEE 802.20, UWB (Ultra-WideBand), Bluetooth (registered trademark), or other suitable systems and/or next generation systems enhanced based on these. In addition, multiple systems may be combined (e.g., a combination of at least one of LTE and LTE-A with 5G, etc.).

本明細書で説明した各態様／実施形態の処理手順、シーケンス、フローチャートなどは、矛盾の無い限り、順序を入れ替えてもよい。例えば、本明細書で説明した方法については、例示的な順序で様々なステップの要素を提示しており、提示した特定の順序に限定されない。 The steps, sequences, flow charts, etc. of each aspect/embodiment described herein may be reordered unless inconsistent. For example, the methods described herein present elements of various steps in an example order and are not limited to the particular order presented.

本開示において基地局によって行われるとした特定動作は、場合によってはその上位ノード（upper node）によって行われることもある。基地局を有する１つ又は複数のネットワークノード（network nodes）からなるネットワークにおいて、端末との通信のために行われる様々な動作は、基地局及び基地局以外の他のネットワークノード（例えば、ＭＭＥ又はＳ－ＧＷなどが考えられるが、これらに限られない）の少なくとも１つによって行われ得ることは明らかである。上記において基地局以外の他のネットワークノードが１つである場合を例示したが、複数の他のネットワークノードの組み合わせ（例えば、ＭＭＥ及びＳ－ＧＷ）であってもよい。 Specific operations that are described as being performed by a base station in this disclosure may also be performed by its upper node in some cases. In a network consisting of one or more network nodes having a base station, it is clear that various operations performed for communication with a terminal may be performed by at least one of the base station and other network nodes other than the base station (e.g., MME or S-GW, etc., but are not limited to these). Although the above example shows a case where there is one other network node other than the base station, it may also be a combination of multiple other network nodes (e.g., MME and S-GW).

情報等（※「情報、信号」の項目参照）は、上位レイヤ（又は下位レイヤ）から下位レイヤ（又は上位レイヤ）へ出力され得る。複数のネットワークノードを介して入出力されてもよい。 Information, etc. (see the "Information, Signals" section) can be output from a higher layer (or a lower layer) to a lower layer (or a higher layer). It may also be input and output via multiple network nodes.

入出力された情報等は特定の場所(例えば、メモリ)に保存されてもよいし、管理テーブルで管理してもよい。入出力される情報等は、上書き、更新、または追記され得る。出力された情報等は削除されてもよい。入力された情報等は他の装置へ送信されてもよい。 The input and output information may be stored in a specific location (e.g., memory) or may be managed in a management table. The input and output information may be overwritten, updated, or added to. The output information may be deleted. The input information may be sent to another device.

判定は、１ビットで表される値（０か１か）によって行われてもよいし、真偽値（Boolean：trueまたはfalse）によって行われてもよいし、数値の比較（例えば、所定の値との比較）によって行われてもよい。 The determination may be based on a value represented by one bit (0 or 1), a Boolean (true or false) value, or a numerical comparison (e.g., with a predetermined value).

本開示において説明した各態様／実施形態は単独で用いてもよいし、組み合わせて用いてもよいし、実行に伴って切り替えて用いてもよい。また、所定の情報の通知（例えば、「Ｘであること」の通知）は、明示的に行うものに限られず、暗黙的（例えば、当該所定の情報の通知を行わない）ことによって行われてもよい。 Each aspect/embodiment described in this disclosure may be used alone, in combination, or switched depending on the execution. In addition, notification of specific information (e.g., notification that "X is the case") is not limited to being done explicitly, but may be done implicitly (e.g., not notifying the specific information).

ソフトウェアは、ソフトウェア、ファームウェア、ミドルウェア、マイクロコード、ハードウェア記述言語と呼ばれるか、他の名称で呼ばれるかを問わず、命令、命令セット、コード、コードセグメント、プログラムコード、プログラム、サブプログラム、ソフトウェアモジュール、アプリケーション、ソフトウェアアプリケーション、ソフトウェアパッケージ、ルーチン、サブルーチン、オブジェクト、実行可能ファイル、実行スレッド、手順、機能などを意味するよう広く解釈されるべきである。 Software shall be construed broadly to mean instructions, instruction sets, code, code segments, program code, programs, subprograms, software modules, applications, software applications, software packages, routines, subroutines, objects, executable files, threads of execution, procedures, functions, etc., whether referred to as software, firmware, middleware, microcode, hardware description language, or otherwise.

また、ソフトウェア、命令などは、伝送媒体を介して送受信されてもよい。例えば、ソフトウェアが、同軸ケーブル、光ファイバケーブル、ツイストペア及びデジタル加入者回線（ＤＳＬ）などの有線技術及び／又は赤外線、無線及びマイクロ波などの無線技術を使用してウェブサイト、サーバ、又は他のリモートソースから送信される場合、これらの有線技術及び／又は無線技術は、伝送媒体の定義内に含まれる。 Software, instructions, etc. may also be transmitted and received over a transmission medium. For example, if the software is transmitted from a website, server, or other remote source using wired technologies, such as coaxial cable, fiber optic cable, twisted pair, and digital subscriber line (DSL), and/or wireless technologies, such as infrared, radio, and microwave, these wired and/or wireless technologies are included within the definition of a transmission medium.

本開示において説明した情報、信号などは、様々な異なる技術のいずれかを使用して表されてもよい。例えば、上記の説明全体に渡って言及され得るデータ、命令、コマンド、情報、信号、ビット、シンボル、チップなどは、電圧、電流、電磁波、磁界若しくは磁性粒子、光場若しくは光子、又はこれらの任意の組み合わせによって表されてもよい。 The information, signals, etc. described in this disclosure may be represented using any of a variety of different technologies. For example, data, instructions, commands, information, signals, bits, symbols, chips, etc. that may be referred to throughout the above description may be represented by voltages, currents, electromagnetic waves, magnetic fields or magnetic particles, optical fields or photons, or any combination thereof.

なお、本開示において説明した用語及び／又は本明細書の理解に必要な用語については、同一の又は類似する意味を有する用語と置き換えてもよい。 In addition, terms explained in this disclosure and/or terms necessary for understanding this specification may be replaced with terms having the same or similar meanings.

本明細書で使用する「システム」および「ネットワーク」という用語は、互換的に使用される。 As used herein, the terms "system" and "network" are used interchangeably.

また、本明細書で説明した情報、パラメータなどは、絶対値で表されてもよいし、所定の値からの相対値で表されてもよいし、対応する別の情報で表されてもよい。例えば、無線リソースはインデックスによって指示されるものであってもよい。 In addition, the information, parameters, etc. described in this specification may be expressed as absolute values, as relative values from a predetermined value, or as corresponding other information. For example, radio resources may be indicated by an index.

上述したパラメータに使用する名称はいかなる点においても限定的な名称ではない。さらに、これらのパラメータを使用する数式等は、本開示で明示的に開示したものと異なる場合もある。様々なチャネル（例えば、ＰＵＣＣＨ、ＰＤＣＣＨなど）及び情報要素は、あらゆる好適な名称によって識別できるので、これらの様々なチャネル及び情報要素に割り当てている様々な名称は、いかなる点においても限定的な名称ではない。 The names used for the parameters described above are not intended to be limiting in any way. Furthermore, the formulas etc. using these parameters may differ from those explicitly disclosed in this disclosure. The various channels (e.g., PUCCH, PDCCH, etc.) and information elements may be identified by any suitable names, and therefore the various names assigned to these various channels and information elements are not intended to be limiting in any way.

本開示で使用する「判断(determining)」、「決定(determining)」という用語は、多種多様な動作を包含する場合がある。「判断」、「決定」は、例えば、判定(judging)、計算(calculating)、算出(computing)、処理(processing)、導出(deriving)、調査(investigating)、探索(looking up、search、inquiry)（例えば、テーブル、データベース又は別のデータ構造での探索）、確認(ascertaining)した事を「判断」「決定」したとみなす事などを含み得る。また、「判断」、「決定」は、受信(receiving)（例えば、情報を受信すること）、送信(transmitting)(例えば、情報を送信すること)、入力(input)、出力(output)、アクセス(accessing)（例えば、メモリ中のデータにアクセスすること）した事を「判断」「決定」したとみなす事などを含み得る。また、「判断」、「決定」は、解決(resolving)、選択(selecting)、選定(choosing)、確立(establishing)、比較(comparing)などした事を「判断」「決定」したとみなす事を含み得る。つまり、「判断」「決定」は、何らかの動作を「判断」「決定」したとみなす事を含み得る。また、「判断（決定）」は、「想定する（assuming）」、「期待する（expecting）」、「みなす（considering）」などで読み替えられてもよい。 As used in this disclosure, the terms "determining" and "determining" may encompass a wide variety of actions. "Determining" and "determining" may include, for example, judging, calculating, computing, processing, deriving, investigating, looking up, searching, inquiring (e.g., searching in a table, database, or other data structure), ascertaining, and the like. "Determining" and "determining" may also include receiving (e.g., receiving information), transmitting (e.g., sending information), input, output, accessing (e.g., accessing data in memory), and the like. Additionally, "judgment" and "decision" can include considering resolving, selecting, choosing, establishing, comparing, etc., to have been "judged" or "decided." In other words, "judgment" and "decision" can include considering some action to have been "judged" or "decided." Additionally, "judgment (decision)" can be interpreted as "assuming," "expecting," "considering," etc.

本開示で使用する「に基づいて」という記載は、別段に明記されていない限り、「のみに基づいて」を意味しない。言い換えれば、「に基づいて」という記載は、「のみに基づいて」と「に少なくとも基づいて」の両方を意味する。 As used in this disclosure, the phrase "based on" does not mean "based only on," unless expressly stated otherwise. In other words, the phrase "based on" means both "based only on" and "based at least on."

本明細書で「第１の」、「第２の」などの呼称を使用した場合においては、その要素へのいかなる参照も、それらの要素の量または順序を全般的に限定するものではない。これらの呼称は、２つ以上の要素間を区別する便利な方法として本明細書で使用され得る。したがって、第１および第２の要素への参照は、２つの要素のみがそこで採用され得ること、または何らかの形で第１の要素が第２の要素に先行しなければならないことを意味しない。 When designations such as "first," "second," and the like are used herein, any reference to that element is not intended to generally limit the quantity or order of those elements. These designations may be used herein as a convenient way to distinguish between two or more elements. Thus, a reference to a first and a second element does not imply that only two elements may be employed therein or that the first element must precede the second element in some way.

「含む（include）」、「含んでいる（including）」、およびそれらの変形が、本明細書あるいは特許請求の範囲で使用されている限り、これら用語は、用語「備える(comprising)」と同様に、包括的であることが意図される。さらに、本明細書あるいは特許請求の範囲において使用されている用語「または（or）」は、排他的論理和ではないことが意図される。 To the extent that the terms "include," "including," and variations thereof are used herein in the specification or claims, these terms are intended to be inclusive, similar to the term "comprising." Further, the term "or" as used herein is not intended to be an exclusive or.

本開示において、例えば、英語でのa, an及びtheのように、翻訳により冠詞が追加された場合、本開示は、これらの冠詞の後に続く名詞が複数形であることを含んでもよい。 In this disclosure, where articles have been added through translation, such as a, an, and the in English, this disclosure may include that the nouns following these articles are in the plural form.

本開示において、「ＡとＢが異なる」という用語は、「ＡとＢが互いに異なる」ことを意味してもよい。なお、当該用語は、「ＡとＢがそれぞれＣと異なる」ことを意味してもよい。「離れる」、「結合される」などの用語も、「異なる」と同様に解釈されてもよい。 In this disclosure, the term "A and B are different" may mean "A and B are different from each other." The term may also mean "A and B are each different from C." Terms such as "separate" and "combined" may also be interpreted in the same way as "different."

１…埋め込み表現生成システム、１０…埋め込み表現生成装置、１１…発話ログ取得部、１２…音声認識部、１３…テキスト取得部、１４…感情取得部、１５…言語理解部、１６…話題抽出部、１７…埋め込み表現取得部、１８…関係抽出部、１９…関係学習部、２０…埋め込み表現出力部、２１…リンク予測部、２２…表現管理部、ｄｅ…復号部、ｅｎ…埋め込み部、ｇｎ…関係グラフ、Ｍ１…記録媒体、ｍ１０…メインモジュール、ｍ１１…発話ログ取得モジュール、ｍ１２…音声認識モジュール、ｍ１３…テキスト取得モジュール、ｍ１４…感情取得モジュール、ｍ１５…言語理解モジュール、ｍ１６…話題抽出モジュール、ｍ１７…埋め込み表現取得モジュール、ｍ１８…関係抽出モジュール、ｍ１９…関係学習モジュール、ｍ２０…埋め込み表現出力モジュール、ｍ２１…リンク予測モジュール、ｍｄ…言語モデル、Ｐ１…埋め込み表現生成プログラム。 1...embedded expression generation system, 10...embedded expression generation device, 11...speech log acquisition unit, 12...speech recognition unit, 13...text acquisition unit, 14...emotion acquisition unit, 15...language understanding unit, 16...topic extraction unit, 17...embedded expression acquisition unit, 18...relation extraction unit, 19...relation learning unit, 20...embedded expression output unit, 21...link prediction unit, 22...expression management unit, de...decoding unit, en...embedding unit, gn...relation graph, M1...recording medium, m10...main module, m11...speech log acquisition module, m12...speech recognition module, m13...text acquisition module, m14...emotion acquisition module, m15...language understanding module, m16...topic extraction module, m17...embedded expression acquisition module, m18...relation extraction module, m19...relation learning module, m20...embedded expression output module, m21...link prediction module, md...language model, P1...embedded expression generation program.

Claims

An embedded expression generation system for generating embedded expressions of at least a user and a topic, comprising:
A language understanding unit that learns a language model configured by an encoder-decoder model including an embedding unit and a decoding unit,
The embedding unit outputs an embedding expression representing a feature of the input text;
The decoding unit decodes an embedded representation including at least an output from the embedding unit;
a first user utterance text representing the content of a user's utterance among the utterance texts representing the content of the utterances of the users is input to the embedding unit to obtain a user utterance embedded expression output from the embedding unit; a composite embedded expression obtained by combining the user utterance embedded expression and a user embedded expression that is an embedded expression of the one user is input to the decoding unit to obtain a decoded text output from the decoding unit; and machine learning is performed to adjust the language model and the user embedded expression so that an error between a second user utterance text following the first user utterance text in the utterance text and the decoded text is reduced;
A language understanding unit, wherein the user-embedded representation is an initial user-embedded representation before learning or a user-embedded representation in a learning process;
a topic extraction unit that extracts topic words, which are words expressing topics in the user's utterance, from the utterance text;
an embedding expression acquisition unit that inputs the topic word to the learned embedding unit and acquires the topic embedding expression output from the embedding unit;
a relationship extraction unit that generates a relationship graph based on the user's speech history and behavior history, the relationship graph being a graph in which at least users and topics are nodes, conversation records between users are edges connecting users, and the user's speech records of the topic words are edges connecting the user and the topics; and
a relationship learning unit that obtains a learned embedding representation of each node by learning a graph neural network in which the learned user embedding representation and the topic embedding representation are each set as features of the user and topic nodes in the relationship graph;
an embedding representation output unit that outputs the learned embedding representation of each node;
An embedded representation generation system comprising:

an emotion acquisition unit that acquires emotion information representing an emotion of the user when the user uttered an utterance based on a voice of the utterance or a facial expression of the user, and associates the acquired emotion information with the utterance text representing the content of the utterance,
The language understanding unit performs machine learning to adjust the language model and the user-embedded expressions using the spoken text associated with emotion information expressing a predetermined positive emotion.
The embedded representation generation system of claim 1 .

The embedded expression acquisition unit further acquires a location embedded expression output from the embedding unit by inputting a location text representing a location to the learned embedding unit;
the relationship extraction unit generates the relationship graph based on the user's speech history and action history, the relationship graph being a graph in which at least users, topics, and places are nodes, conversation records between users are edges connecting users, the user's speech records of the topic words are edges connecting the user and topics, and the user's visit records to places are edges connecting the user and places;
the relationship learning unit obtains a learned embedding representation for each node by learning a graph neural network in which the learned user embedding representation, the topic embedding representation, and the place embedding representation are set as features of the user, topic, and place nodes in the relationship graph;
The embedded expression generation system according to claim 1 or 2.

The method further includes a link prediction unit that calculates a distance between the nodes based on the learned embedding representation of each node, and calculates link prediction information indicating a possibility that an edge will be established between each node based on the calculated distance between the nodes.
The embedded representation generation system of claim 1 .

The link prediction unit outputs, based on a given threshold value regarding a distance between nodes, information indicating each node whose inter-node distance is equal to or less than the threshold value as the link prediction information.
The system of claim 4.

The speech text is obtained based on a speech log of voice or text representing the content of the user's utterance in a predetermined virtual space.
The embedded representation generation system of claim 1 .

The relationship extraction unit generates the relationship graph based on a speech history and a behavior history of the user in a predetermined virtual space.
The embedded representation generation system of claim 1 .