JP7258686B2

JP7258686B2 - Information processing system, information processing method, and program

Info

Publication number: JP7258686B2
Application number: JP2019134713A
Authority: JP
Inventors: 尚史福江
Original assignee: TIS Inc
Current assignee: TIS Inc
Priority date: 2019-07-22
Filing date: 2019-07-22
Publication date: 2023-04-17
Anticipated expiration: 2039-07-22
Also published as: JP2021018664A

Description

本発明は、情報処理システム、情報処理方法、及びプログラムに関する。 The present invention relates to an information processing system, an information processing method, and a program.

画像センサで検知された画像情報に基づいてスピーカに音声出力させるか否かを決定する決定装置が開示されている（特許文献１）。 A decision device is disclosed that decides whether or not to output sound from a speaker based on image information detected by an image sensor (Patent Document 1).

特開２０１９－３５８９７号公報JP 2019-35897 A

特許文献１には、ユーザの居住内に設けられたスピーカにおける音声出力のタイミングを画像情報に基づいて決定する決定装置が開示されている。また、特許文献１に記載の決定装置は、音声情報が途切れたタイミングにおいてスピーカに音声出力させる。特許文献１の決定装置によれば居住内の状況に応じてスピーカに音声出力させることができる。しかしながら、特許文献１に記載の決定装置では、スピーカからユーザに対して能動的に音声出力させることができないため、会議などを円滑に進行させるには不十分であるという問題があった。 Patent Literature 1 discloses a determination device that determines the timing of audio output from a speaker provided in a user's residence based on image information. Further, the determination device described in Patent Document 1 causes the speaker to output audio at the timing when the audio information is interrupted. According to the determination device of Patent Document 1, it is possible to cause the speaker to output sound according to the situation in the residence. However, the decision device described in Patent Document 1 cannot actively output voice from the speaker to the user.

本発明の目的は、上記のような問題に鑑みてなされたものであり、スピーカを使用するユーザのユーザ情報に基づいて能動的に音声出力するシステムを提供することにある。 SUMMARY OF THE INVENTION An object of the present invention is to provide a system for actively outputting sound based on user information of a user using a speaker.

本発明の一態様に係る情報処理システムは、マイクロフォンとスピーカとを備えるスピーカ装置を使用するユーザに関するユーザ情報を取得するユーザ情報取得部と、前記ユーザ情報取得部で取得された前記ユーザ情報に基づいて、所定の応答内容を特定する応答内容特定部と、前記スピーカ装置に前記所定の応答内容に沿って音声出力させるべく、前記特定部で特定された前記所定の応答内容に基づく音声情報を、前記スピーカ装置に送信する送信部と、を備える。 An information processing system according to an aspect of the present invention includes a user information acquisition unit that acquires user information about a user using a speaker device that includes a microphone and a speaker, and based on the user information acquired by the user information acquisition unit: a response content identifying unit that identifies predetermined response content; and audio information based on the predetermined response content identified by the identifying unit so as to cause the speaker device to output voice along with the predetermined response content, and a transmission unit that transmits to the speaker device.

本発明の一態様に係る情報処置方法は、コンピュータが、マイクロフォンとスピーカとを備えるスピーカ装置を使用するユーザに関するユーザ情報を取得するユーザ情報取得ステップと、前記ユーザ情報取得ステップで取得された前記ユーザ情報に基づいて、所定の応答内容を特定する応答内容特定ステップと、前記スピーカ装置に前記所定の応答内容に沿って音声出力させるべく、前記応答内容特定ステップで特定された前記所定の応答内容に基づく音声情報を、前記スピーカ装置に送信する送信ステップと、を実現する。 An information processing method according to an aspect of the present invention includes a user information acquisition step in which a computer acquires user information about a user who uses a speaker device including a microphone and a speaker; a response content specifying step of specifying predetermined response content based on the information; and a transmission step of transmitting the audio information based on the above to the speaker device.

本発明の一態様に係るプログラムは、コンピュータに、マイクロフォンとスピーカとを備えるスピーカ装置を使用するユーザに関するユーザ情報を取得させることと、前記ユーザ情報に基づいて、所定の応答内容を特定させることと、前記スピーカ装置に前記所定の応答内容に沿って音声出力させるべく、特定された前記所定の応答内容に基づく音声情報を、前記スピーカ装置に送信させることと、を実現させる。 A program according to an aspect of the present invention causes a computer to acquire user information about a user who uses a speaker device including a microphone and a speaker, and to specify predetermined response content based on the user information. and causing the speaker device to transmit voice information based on the specified predetermined response content so as to cause the speaker device to output voice along with the predetermined response content.

本発明によれば、ユーザ情報に基づき能動的にユーザに対して音声出力することで、ユーザの発言を促すことができる。 According to the present invention, it is possible to encourage the user to speak by actively outputting voice to the user based on the user information.

音声通知システムの構成の一例を示す図である。It is a figure which shows an example of a structure of a voice notification system. 音声通知システムにおける処理の概要を示す図である。FIG. 4 is a diagram showing an overview of processing in the voice notification system; 応答サーバ装置の機能構成の一例を示す図である。It is a figure which shows an example of the functional structure of a response server apparatus. スピーカ情報テーブルの一例を示す図である。It is a figure which shows an example of a speaker information table. ユーザ情報テーブルの一例を示す図である。It is a figure which shows an example of a user information table. 議事録情報テーブルの一例を示す図である。It is a figure which shows an example of a minutes information table. 画像情報テーブルの一例を示す図である。It is a figure which shows an example of an image information table. 応答内容テーブルの一例を示す図である。It is a figure which shows an example of a response content table. ユーザ情報を取得する方法の一例を示す図である。It is a figure which shows an example of the method of acquiring user information. ユーザ情報を取得する方法の他の例を示す図である。FIG. 10 is a diagram showing another example of a method of acquiring user information; スピーカ装置の機能構成の一例を示す図である。It is a figure which shows an example of the functional structure of a speaker apparatus. ユーザ端末装置の機能構成の一例を示す図である。It is a figure which shows an example of a functional structure of a user terminal device. 応答サーバ装置の処理の一例を示すフロー図である。It is a flowchart which shows an example of a process of a response server apparatus. コンピュータのハードウェア構成の一例を示す図である。It is a figure which shows an example of the hardware constitutions of a computer. 応答サーバ装置の処理の他の例を示すフロー図である。FIG. 11 is a flowchart showing another example of processing of the response server device;

以下に、本発明の一実施形態における音声通知システム１について、図面を参照して詳細に説明する。ただし、以下に説明する実施形態は、あくまでも例示であり、以下に明示しない種々の変形や技術の適用を排除する意図はない。即ち、本発明は、その趣旨を逸脱しない範囲で種々変形し、又は各実施例を組み合わせる等して実施することができる。また、以下の図面の記載において、同一または類似の部分には同一または類似の符号を付して表している。
＝＝構成＝＝ A voice notification system 1 according to an embodiment of the present invention will be described in detail below with reference to the drawings. However, the embodiments described below are merely examples, and are not intended to exclude various modifications and application of techniques not explicitly described below. That is, the present invention can be practiced by variously modifying or combining each embodiment without departing from the scope of the invention. In addition, in the following description of the drawings, the same or similar parts are denoted by the same or similar reference numerals.
==Configuration==

図１は、音声通知システム１の構成の一例を示す図である。図１に示すように、音声通知システム１は、例えば、応答サーバ装置１０、スピーカ装置２０、ユーザ端末装置３０を含む。なお、応答サーバ装置１０とスピーカ装置２０の機能を一のシステムで実現してもよい。また、応答サーバ装置１０とスピーカ装置２０のそれぞれの機能を、他の複数の装置で実現してもよい。以下、音声通知システム１の各構成要素について説明する。 FIG. 1 is a diagram showing an example of the configuration of a voice notification system 1. As shown in FIG. As shown in FIG. 1, the voice notification system 1 includes, for example, a response server device 10, a speaker device 20, and a user terminal device 30. FIG. Note that the functions of the response server device 10 and the speaker device 20 may be implemented in one system. Also, the respective functions of the response server device 10 and the speaker device 20 may be realized by a plurality of other devices. Each component of the voice notification system 1 will be described below.

応答サーバ装置１０は、スピーカ装置２０に所定の音声情報を送信することで、ユーザに対してスピーカ装置２０から能動的に発話させる装置である。応答サーバ装置１０は、例えばサーバコンピュータなどの情報処理装置で構成され、ネットワーク２００を介して、スピーカ装置２０、ユーザ端末装置３０と接続される。応答サーバ装置１０と他の装置との間の各種データの送受信については後述する。 The response server device 10 is a device that causes the user to actively speak from the speaker device 20 by transmitting predetermined audio information to the speaker device 20 . The response server device 10 is composed of an information processing device such as a server computer, for example, and is connected to the speaker device 20 and the user terminal device 30 via the network 200 . Transmission and reception of various data between the response server device 10 and other devices will be described later.

なお、音声通知システム１には音声認識サーバ装置（不図示）が含まれていてもよい。この場合、応答サーバ装置１０は、音声認識サーバ装置において様々な従来技術を用いてユーザの音声を認識して、ユーザの音声を解析して所定の応答を実行する際の、バックエンドとして機能する。すなわち、後述するスピーカ装置２０は、応答サーバ装置１０が提供する機能をＡＰＩ（ＡｐｐｌｉｃａｔｉｏｎＰｒｏｇｒａｍｍｉｎｇＩｎｔｅｒｆａｃｅ）で呼び出すことで、応答サーバ装置１０の機能を利用してもよい。以下では音声認識サーバ装置の有する機能については適宜省略して説明する。 Note that the voice notification system 1 may include a voice recognition server device (not shown). In this case, the response server device 10 functions as a back end when the voice recognition server device recognizes the user's voice using various conventional techniques, analyzes the user's voice, and executes a predetermined response. . That is, the speaker device 20, which will be described later, may use the functions of the response server device 10 by calling the functions provided by the response server device 10 using an API (Application Programming Interface). In the following description, the functions of the speech recognition server device will be omitted as appropriate.

スピーカ装置２０は、ユーザからの音声を取得し、該音声を音声情報に変換して応答サーバ装置１０に送信する装置である。また、スピーカ装置２０から取得した音声情報に基づき発話する。スピーカ装置２０は所謂スマートスピーカである。ここで、以下説明の理解を助けるために、スピーカ装置２０のハードウェア構成の一例について説明する。スピーカ装置２０は、例えば、音声を検出して電気信号に変換するマイクロフォン、応答サーバ装置１０から取得する音声情報を音声出力するスピーカ、外部の装置と通信するための通信モジュール、視覚的にスピーカ装置２０のステータスを示す表示部、各種操作指示を行うための操作ボタン、サポートオペレータと通信するためのサポートボタン、各構成要素を制御する制御部を備える。なお、サポートボタンの機能については後述する。スピーカ装置２０については、様々の種類のものが存在し、例えば、複数マイクロフォン及び複数スピーカを有するものや、上面の外周部に等間隔にマイクロフォンを配設したものや、側面の外周部に等間隔にスピーカを配設したものなどが存在し、その仕様が限定されるものではない。 The speaker device 20 is a device that acquires voice from the user, converts the voice into voice information, and transmits the voice information to the response server device 10 . Also, it speaks based on the audio information acquired from the speaker device 20 . The speaker device 20 is a so-called smart speaker. Here, an example of the hardware configuration of the speaker device 20 will be described in order to facilitate understanding of the following description. The speaker device 20 includes, for example, a microphone that detects sound and converts it into an electric signal, a speaker that outputs sound information obtained from the response server device 10, a communication module that communicates with an external device, and a speaker device that can be used visually. 20, an operation button for giving various operation instructions, a support button for communicating with a support operator, and a control unit for controlling each component. The function of the support button will be described later. There are various types of the speaker device 20, for example, one having a plurality of microphones and a plurality of speakers, a device having microphones arranged at equal intervals on the outer circumference of the upper surface, and one having microphones arranged at equal intervals on the outer circumference of the side surface. There is a device in which a speaker is arranged in the device, and the specifications are not limited.

ユーザ端末装置３０は、ユーザの端末装置であり例えばスマートフォンである。ユーザ端末装置３０は、ネットワーク２００を介して応答サーバ装置１０に接続され、応答サーバ装置１０から送信される各種情報を、図１３に示す表示装置１０７に表示する。ユーザは、ユーザ端末装置３０を用いて応答サーバ装置１０に各種要求を行う。なお、ユーザ端末装置３０は近距離無線通信にてスピーカ装置２０に接続されてもよい。
＝＝音声通知システム１の概要＝＝ The user terminal device 30 is a terminal device of a user, such as a smart phone. The user terminal device 30 is connected to the response server device 10 via the network 200, and displays various information transmitted from the response server device 10 on the display device 107 shown in FIG. The user uses the user terminal device 30 to make various requests to the response server device 10 . Note that the user terminal device 30 may be connected to the speaker device 20 by short-range wireless communication.
==Overview of Voice Notification System 1==

図２は、音声通知システム１における処理の概要を示す図である。図２を参照して、音声通知システム１の動作の概要を説明する。 FIG. 2 is a diagram showing an outline of processing in the voice notification system 1. As shown in FIG. An overview of the operation of the voice notification system 1 will be described with reference to FIG.

まず、Ｓ１において、スピーカ装置２０は、スピーカ装置２０を識別するためのスピーカ情報を応答サーバ装置１０に送信する。これにより応答サーバ装置１０においてスピーカ装置２０の設置場所や機能などを把握できる。 First, in S1 , the speaker device 20 transmits speaker information for identifying the speaker device 20 to the response server device 10 . This allows the response server device 10 to grasp the installation location and functions of the speaker device 20 .

次に、Ｓ２において、ユーザ端末装置３０は、スピーカ装置２０を介してまたは応答サーバ装置１０に直接、ユーザ端末装置３０を識別するためのユーザ情報を送信する。これにより応答サーバにおいてユーザ端末装置３０がどのスピーカ装置２０と関連付けられているかを特定される。 Next, in S2 , the user terminal device 30 transmits user information for identifying the user terminal device 30 via the speaker device 20 or directly to the response server device 10 . As a result, which speaker device 20 the user terminal device 30 is associated with is specified in the response server.

次に、Ｓ３において、応答サーバ装置１０は、スピーカ情報とユーザ情報とに基づいて、所定の条件で特定された音声情報をスピーカ装置２０に送信する。これにより、スピーカ装置２０は、ユーザがスピーカ装置２０に対して発話する前に、ユーザにとって適切なタイミングで適切な内容の音声を、スピーカ装置２０からユーザに対して発話する。スピーカ装置２０から出力される音声の内容やタイミングなどについては後述する。 Next, in S3 , the response server device 10 transmits audio information specified under predetermined conditions to the speaker device 20 based on the speaker information and the user information. As a result, the speaker device 20 utters appropriate content of sound from the speaker device 20 to the user at an appropriate timing for the user before the user speaks to the speaker device 20 . The content and timing of the sound output from the speaker device 20 will be described later.

次に、Ｓ４において、応答サーバ装置１０はスピーカ装置２０を介してユーザの音声を音声情報として取得し、該音声情報に基づき特定された音声情報をスピーカ装置２０に送信する。これによりユーザにはスピーカ装置２０を介して音声にて様々な情報が提供される。提供される情報については後述する。
＝＝音声通知システム１の構成＝＝ Next, in S4 , the response server device 10 acquires the user's voice as voice information via the speaker device 20 and transmits voice information specified based on the voice information to the speaker device 20 . As a result, the user is provided with various information by voice through the speaker device 20 . The information provided will be described later.
== Configuration of voice notification system 1 ==

以下、応答サーバ装置１０、スピーカ装置２０、ユーザ端末装置３０が備える機能について説明する。なお、以下説明においては、理解を容易にするために、一例として、予約管理システム（不図示）にて予約済の会議において、ユーザが所定の会議室に設置されたスピーカ装置２０の使用を想定する。
＜＜応答サーバ装置１０＞＞ Functions of the response server device 10, the speaker device 20, and the user terminal device 30 will be described below. In the following description, for ease of understanding, as an example, it is assumed that the user uses the speaker device 20 installed in a predetermined conference room in a conference reserved by a reservation management system (not shown). do.
<<response server device 10>>

図３を参照して、応答サーバ装置１０の機能構成について説明する。図３は、応答サーバ装置１０の機能構成の一例を示す図である。図３に示すとおり、応答サーバ装置１０は、記憶部１１、スピーカ情報取得部１２ａ、ユーザ情報取得部１２ｂ、予約情報取得部１２ｃ、認識部１３、議事録特定部１４、画像特定部１５、応答内容特定部１６、分析部１７、送信部１８の機能を有する。応答サーバ装置１０が有する機能は、図１３に示すプロセッサ１０１が、記憶装置１０３に記憶されたコンピュータプログラムを読み込み、実行し、応答サーバ装置１０の各構成を制御すること等により実現される。 A functional configuration of the response server device 10 will be described with reference to FIG. FIG. 3 is a diagram showing an example of the functional configuration of the response server device 10. As shown in FIG. As shown in FIG. 3, the response server device 10 includes a storage unit 11, a speaker information acquisition unit 12a, a user information acquisition unit 12b, a reservation information acquisition unit 12c, a recognition unit 13, a minutes identification unit 14, an image identification unit 15, a response It has the functions of a content identification unit 16 , an analysis unit 17 and a transmission unit 18 . The functions of response server device 10 are implemented by processor 101 shown in FIG.

記憶部１１は、例えば、スピーカ情報テーブル１１ａ、ユーザ情報テーブル１１ｂ、議事録情報テーブル１１ｃ、画像情報テーブル１１ｄ、応答内容テーブル１１ｅを有する。各テーブルは一例を示すものであり、その内容が特に限定されるものではない。 The storage unit 11 has, for example, a speaker information table 11a, a user information table 11b, a minutes information table 11c, an image information table 11d, and a response content table 11e. Each table shows an example, and its contents are not particularly limited.

スピーカ情報テーブル１１ａは、スピーカ情報を格納したテーブルである。図４に示すように、スピーカ情報テーブル１１ａのデータ構造は、例えばスピーカＩＤなどの適宜な項目を主キーとして、設置場所、仕様などのデータから成るレコードの集合体である。ここで、スピーカＩＤとはスピーカを識別する識別符号である。識別符号はユニークな番号であればよい。設置場所とは該スピーカ装置２０が設置されている場所であり例えば会議室番号などである。仕様とは該スピーカ装置２０の機能仕様である。スピーカ情報テーブル１１ａの内容は例えば応答サーバ装置１０の管理者により適宜更新される。応答サーバ装置１０はスピーカ情報テーブル１１ａを参照することでスピーカ装置２０を特定できる。 The speaker information table 11a is a table storing speaker information. As shown in FIG. 4, the data structure of the speaker information table 11a is a collection of records consisting of data such as installation location and specifications, with appropriate items such as speaker IDs as primary keys. Here, the speaker ID is an identification code for identifying the speaker. The identification code may be any unique number. The installation location is the location where the speaker device 20 is installed, such as a conference room number. The specifications are functional specifications of the speaker device 20 . The contents of the speaker information table 11a are appropriately updated by the administrator of the response server device 10, for example. The response server device 10 can identify the speaker device 20 by referring to the speaker information table 11a.

ユーザ情報テーブル１１ｂは、ユーザ情報を格納したテーブルである。図５に示すように、ユーザ情報テーブル１１ｂのデータ構造は、例えばユーザＩＤなどの適宜な項目を主キーとして、氏名、役職などのデータから成るレコードの集合体である。ここで、ユーザＩＤとはユーザを識別する識別符号であり、例えば、ユニークな任意の番号、携帯番号、ＭＡＣ（ＭｅｄｉａＡｃｃｅｓｓＣｏｎｔｒｏｌ）アドレス、ＢＤ（Ｂｌｕｅｔｏｏｔｈ（登録商標）Ｄｅｖｉｃｅ）アドレスなどである。氏名とはユーザの氏名を示す。役職とはユーザの役職を示す。ユーザ情報テーブル１１ｂの内容は例えば応答サーバ装置１０の管理者により適宜更新される。応答サーバ装置１０はユーザ情報テーブル１１ｂを参照することでスピーカ装置２０を使用するユーザを特定できる。 The user information table 11b is a table that stores user information. As shown in FIG. 5, the data structure of the user information table 11b is a collection of records consisting of data such as names and positions, with appropriate items such as user IDs as primary keys. Here, the user ID is an identification code that identifies a user, and includes, for example, a unique arbitrary number, mobile phone number, MAC (Media Access Control) address, BD (Bluetooth (registered trademark) Device) address, and the like. The name indicates the name of the user. The job title indicates the job title of the user. The contents of the user information table 11b are appropriately updated by the administrator of the response server device 10, for example. The response server device 10 can identify the user who uses the speaker device 20 by referring to the user information table 11b.

議事録情報テーブル１１ｃは、過去の議事録を示す議事録情報を格納したテーブルである。図６に示すように、議事録情報テーブル１１ｃのデータ構造は、例えば議事録ＩＤなどの適宜な項目を主キーとして、議事録内容、会議ＩＤなどのデータから成るレコードの集合体である。ここで、議事録ＩＤとは議事録を識別する識別符号である。議事録内容とは過去の会議における議事録の内容を記録したデータである。会議ＩＤは該議事録に対応する会議を識別する識別符号である。議事録情報テーブル１１ｃの内容は会議終了時に自動的に更新される。応答サーバ装置１０は議事録情報テーブル１１ｃを参照することで会議に対応する過去の会議における議事録をユーザに提示することができるため、円滑な会議運営を実現できる。 The minutes information table 11c is a table that stores minutes information indicating past minutes. As shown in FIG. 6, the data structure of the minutes information table 11c is a set of records composed of data such as minutes content and meeting ID, with appropriate items such as minutes ID as a primary key. Here, the minutes ID is an identification code for identifying the minutes. The contents of minutes are data recording the contents of the minutes of past meetings. The conference ID is an identification code that identifies the conference corresponding to the minutes. The contents of the minutes information table 11c are automatically updated at the end of the conference. By referring to the minutes information table 11c, the response server device 10 can present the minutes of the past conference corresponding to the conference to the user, so that smooth conference management can be realized.

画像情報テーブル１１ｄは、例えば円滑な会議運営に要する画像を格納したテーブルである。図７に示すように、画像情報テーブル１１ｄのデータ構造は、例えば画像ＩＤなどの適宜な項目を主キーとして、画像（動画含む）などのデータから成るレコードの集合体である。ここで、画像とは静止画像または動画像などである。画像には会議で使用されるプレゼンテーション資料などが含まれていてもよい。画像情報テーブル１１ｄの内容は例えば応答サーバ装置１０の管理者により適宜更新される。応答サーバ装置１０は画像情報テーブル１１ｄを参照することで会議に対応する画像をユーザに提示することができるため、円滑な会議運営を実現できる。 The image information table 11d is a table storing images required for smooth conference management, for example. As shown in FIG. 7, the data structure of the image information table 11d is a collection of records composed of data such as images (including moving images), with appropriate items such as image IDs as primary keys. Here, an image is a still image, a moving image, or the like. The images may include presentation materials and the like used in meetings. The contents of the image information table 11d are appropriately updated by the administrator of the response server device 10, for example. The response server device 10 can present images corresponding to the conference to the user by referring to the image information table 11d, so that smooth conference management can be realized.

応答内容テーブル１１ｅは、例えばユーザ情報や音声情報に対応する応答内容を格納したテーブルである。図８に示すように、応答内容テーブル１１ｅのデータ構造は、例えば応答内容ＩＤなどの適宜な項目を主キーとして、スピーカＩＤ、ユーザＩＤ、画像ＩＤ、議事録ＩＤ、キーワード、応答内容などのデータから成るレコードの集合体である。具体的には、応答内容ＩＤには画像ＩＤや議事録ＩＤなど関連付けられている。例えば、応答サーバ装置１０は、所定の会議において使用されるスピーカ装置２０のスピーカＩＤ、会議室で特定されたユーザ端末装置３０のユーザＩＤなどを特定し、それに対応する応答内容ＩＤを特定する。これにより応答内容ＩＤに対応する応答内容を特定する。ここで、応答内容とは、会議、音声情報、ユーザ情報に対応する単文または複文である。応答内容には、音声、画像、議事録などが示され、スピーカ装置２０に送信する応答内容が示される。また、キーワードとは、音声情報を変換したテキスト情報における単語などである。応答サーバ装置１０は、キーワードに基づいて応答内容を特定することもできる。応答内容テーブル１１ｅの内容は例えば応答サーバ装置１０の管理者により適宜更新される。応答サーバ装置１０は、応答内容テーブル１１ｅを参照することで、会議またはユーザに対応する音声をスピーカ装置２０から出力させることができ、円滑な会議運営を実現できる。 The response content table 11e is a table storing response content corresponding to, for example, user information and voice information. As shown in FIG. 8, the data structure of the response content table 11e includes data such as speaker IDs, user IDs, image IDs, minutes IDs, keywords, and response content, with appropriate items such as response content IDs as primary keys. is a set of records consisting of Specifically, the response content ID is associated with an image ID, minutes ID, and the like. For example, the response server device 10 identifies the speaker ID of the speaker device 20 used in a given conference, the user ID of the user terminal device 30 identified in the conference room, and the like, and identifies the corresponding response content ID. Thereby, the response content corresponding to the response content ID is specified. Here, the response content is a simple sentence or a compound sentence corresponding to the meeting, voice information, and user information. The content of the response indicates the voice, image, minutes, etc., and indicates the content of the response to be transmitted to the speaker device 20 . A keyword is a word or the like in text information obtained by converting voice information. The response server device 10 can also specify the contents of the response based on the keyword. The content of the response content table 11e is appropriately updated by the administrator of the response server device 10, for example. By referring to the response content table 11e, the response server device 10 can cause the speaker device 20 to output voices corresponding to the conference or the user, thereby realizing smooth conference management.

スピーカ情報取得部１２ａは、スピーカ装置２０から送信されたスピーカ情報を取得する。取得されたスピーカ情報はスピーカ情報テーブル１１ａに格納される。 The speaker information acquisition unit 12 a acquires speaker information transmitted from the speaker device 20 . The acquired speaker information is stored in the speaker information table 11a.

ユーザ情報取得部１２ｂは、所定のスピーカ装置２０を使用するユーザ端末装置３０から送信されたユーザ情報を取得する。ユーザ情報には例えばユーザが使用するスピーカ装置２０のスピーカ情報が関連付けられている。ここで、ユーザ情報取得部１２ｂは、ユーザ情報を、ユーザ端末装置３０から直接に取得してもよいし、スピーカ装置２０を介して取得してもよい。 The user information acquisition unit 12b acquires user information transmitted from the user terminal device 30 using the predetermined speaker device 20. FIG. User information is associated with, for example, speaker information of the speaker device 20 used by the user. Here, the user information acquisition unit 12b may acquire the user information directly from the user terminal device 30 or through the speaker device 20 .

具体的には、図９Ａに示すように、応答サーバ装置１０がユーザ端末装置３０から直接、ユーザ情報を取得する場合、例えば、会議室に掲示されるＱＲコード（登録商標）または表示装置３００に表示されるＱＲコードをユーザ端末装置３０の読取機能で読み取ることで、ユーザ情報に該ＱＲコードに含まれるスピーカ情報が付加され、該ユーザ情報が応答サーバ装置１０に送信される。 Specifically, as shown in FIG. 9A, when the response server device 10 acquires the user information directly from the user terminal device 30, for example, the QR code (registered trademark) displayed in the conference room or the display device 300 By reading the displayed QR code with the reading function of the user terminal device 30 , the speaker information included in the QR code is added to the user information, and the user information is transmitted to the response server device 10 .

また、図９Ｂに示すように、応答サーバ装置１０がスピーカ装置２０を介してユーザ情報を取得する場合、例えば、ユーザ端末装置３０はユーザ情報として自己のＭＡＣアドレスまたはＢＤアドレスをスピーカ装置２０に送信することで、ユーザ情報とスピーカ情報とが関連付けられ、それらの情報が応答サーバ装置１０に送信される。 9B, when the response server device 10 acquires user information via the speaker device 20, for example, the user terminal device 30 transmits its own MAC address or BD address to the speaker device 20 as user information. By doing so, the user information and the speaker information are associated, and the information is transmitted to the response server device 10 .

予約情報取得部１２ｃは、予約管理システム（不図示）から、会議室の予約状況を示す予約情報を取得する。予約情報には、例えば、日時、利用者、会議室、利用目的に関する情報が含まれる。これにより、応答サーバ装置１０は、例えば、いつ、どのユーザが、どの会議室を、どのような目的で使用するかを特定できる。 The reservation information acquisition unit 12c acquires reservation information indicating the reservation status of the conference room from a reservation management system (not shown). The reservation information includes, for example, information on date and time, users, conference rooms, and purpose of use. As a result, the response server device 10 can specify, for example, which user uses which meeting room for what purpose at what time.

認識部１３は、ユーザ情報取得部１２ｂで取得されたユーザ情報に基づき、スピーカ情報テーブル１１ａを参照して、ユーザ端末装置３０とユーザが使用するスピーカ装置２０とを関連付けて認識する。これにより、応答サーバ装置１０は、どのユーザがどこの会議室でどのスピーカを使用する状況であるかを特定できる。ここで、応答サーバ装置１０は、認識部１３で認識されたユーザ情報に対応するユーザにつき、スピーカ装置２０に接続された表示装置３００に出力してもよい。これにより会議における参加者の出席状況を把握できる。 The recognition unit 13 refers to the speaker information table 11a based on the user information acquired by the user information acquisition unit 12b, and associates and recognizes the user terminal device 30 and the speaker device 20 used by the user. As a result, the response server device 10 can identify which user is using which speaker in which conference room. Here, the response server device 10 may output the user corresponding to the user information recognized by the recognition unit 13 to the display device 300 connected to the speaker device 20 . This makes it possible to grasp the attendance status of the participants in the conference.

議事録特定部１４は、後述する応答内容特定部１６で議事録を使用すると特定された場合、応答内容特定部の応答内容に応じて、議事録情報テーブル１１ｃを参照して、過去の会議で作成された所定の議事録を特定する。議事録特定部１４は、送信部１８を介して特定された所定の議事録をスピーカ装置２０に接続された表示装置３００に出力してもよい。これにより、ユーザは表示装置３００に出力された以前の会議の議事録を確認できるため会議を円滑に進行できる。 The minutes identification unit 14 refers to the minutes information table 11c according to the response content of the response content identification unit when the response content identification unit 16, which will be described later, identifies that the minutes will be used. Identify the prescribed minutes that have been created. The minutes identifying unit 14 may output predetermined minutes identified via the transmitting unit 18 to the display device 300 connected to the speaker device 20 . As a result, the user can check the minutes of the previous meeting output to the display device 300, so that the meeting can proceed smoothly.

画像特定部１５は、後述する応答内容特定部１６で画像を使用すると特定された場合、応答内容特定部の応答内容に応じて、画像情報テーブル１１ｄを参照して、所定の画像を特定する。画像特定部１５は、送信部１８を介して特定された所定の画像をスピーカ装置２０に接続された表示装置３００に出力してもよい。これにより会議における議論を円滑に進行できる。 When the response content specifying unit 16 (to be described later) specifies that an image is to be used, the image specifying unit 15 refers to the image information table 11d and specifies a predetermined image according to the response content of the response content specifying unit. The image identification unit 15 may output the predetermined image identified via the transmission unit 18 to the display device 300 connected to the speaker device 20 . As a result, the discussion in the meeting can proceed smoothly.

応答内容特定部１６は、応答内容テーブル１１ｅを参照して、スピーカ情報と関連付くユーザ情報に基づいて、応答内容を特定する。具体的に述べると、ユーザ情報に基づいて会議室に存在するユーザを特定し、該ユーザに対する適当な応答内容を特定する。例えば、予約情報に基づき特定される会議に参加する予定のユーザが、会議室に入室した際に、ユーザ情報取得部１２ｂで取得されるユーザ情報に基づいて、応答内容テーブル１１ｅを参照して、例えば該ユーザの名前を確認する応答内容を特定する。特定された応答内容に基づき生成される音声情報を、送信部１８を介してスピーカ装置２０に送信する。これによりスピーカ装置２０を介して能動的にユーザに向けて音声を出力できる。 The response content specifying unit 16 refers to the response content table 11e and specifies the response content based on the user information associated with the speaker information. Specifically, a user present in the conference room is identified based on the user information, and appropriate response content for the user is identified. For example, when a user scheduled to participate in a conference specified based on the reservation information enters the conference room, the response content table 11e is referenced based on the user information acquired by the user information acquisition unit 12b, For example, the content of the response confirming the name of the user is specified. The audio information generated based on the identified response content is transmitted to the speaker device 20 via the transmission section 18 . Thereby, the voice can be actively output to the user through the speaker device 20 .

応答内容特定部１６は、その後、例えば取得されたユーザ情報と予約情報とに基づいて、会議に参加予定のユーザが揃ったことが特定された時点で、応答内容テーブル１１ｅを参照して、参加者の氏名を確認する応答内容を特定する。さらに、例えば会議の終了時間が迫っていることが特定された時点で、応答内容テーブル１１ｅを参照して、会議室の利用時間を延長するか否かを問い合わせる応答内容を特定する。特定された応答内容に基づき生成される音声情報をスピーカ装置２０に送信する。 After that, when it is specified that all the users scheduled to participate in the conference are present, for example, based on the acquired user information and reservation information, the response content specifying unit 16 refers to the response content table 11e, and determines whether to participate. Identifies the content of the response confirming the name of the person. Further, for example, when it is specified that the end time of the conference is approaching, the response content table 11e is referenced to specify the content of the response to inquire whether to extend the usage time of the conference room. Audio information generated based on the identified response content is transmitted to the speaker device 20 .

また、応答内容特定部１６は、特定された応答内容に基づいて、過去の議事録を示す議事録情報または会議に対応する画像を使用するか否かを判定する。 Further, the response content specifying unit 16 determines whether or not to use minutes information indicating past meeting minutes or an image corresponding to the meeting based on the specified response content.

分析部１７は、会議においてユーザが発言した音声に関する音声情報を分析する。具体的に述べると、例えば、分析部１７は音声情報に基づいて会議の参加者が使用した感情を分析する。分析部１７は、例えば感情を示すタグ付きコーパスに基づいて、分類器を学習させる。分析部１７は、ユーザの音声情報をテキスト情報に変換し、例えば所定の自然言語解析手法を用いて単語（形態素）に分解する。該単語を分類器に入力することで音声情報に含まれる感情が分析される。 The analysis unit 17 analyzes voice information related to voices uttered by users in the conference. Specifically, for example, the analysis unit 17 analyzes the emotions used by the conference participants based on the voice information. The analysis unit 17 trains a classifier based on, for example, a corpus with tags indicating emotions. The analysis unit 17 converts the user's voice information into text information, and decomposes it into words (morphemes) using, for example, a predetermined natural language analysis technique. Inputting the word into a classifier analyzes the emotion contained in the speech information.

分析部１７は、例えば感情を分析した結果に基づいて会議に点数を付与する機能を有していてもよい。例えば、会議全体につきプラスの感情、例えば「喜び」「楽しみ」「期待」などの感情の割合が相対的に多い場合は点数を高く評価し、例えば「嫌悪」「絶望」「落胆」などの感情の割合が相対的に多い場合は点数を低く評価する。評価した結果を示す情報をスピーカ装置２０に送信するよう、処理を実行させる。これにより会議の内容を改善する動機をユーザに与える。すなわち応答サーバ装置１０はスピーカ装置２０を介して会議に能動的に参加し、会議の効率化を図ることができる。 The analysis unit 17 may have a function of giving points to the conference based on the result of analyzing the emotions, for example. For example, if the ratio of positive emotions such as "joy", "enjoyment", and "expectation" is relatively high in the whole meeting, the score will be evaluated high, and emotions such as "disgust", "despair", and "disappointment" If the ratio of Processing is executed so as to transmit information indicating the evaluation result to the speaker device 20 . This gives the user motivation to improve the content of the conference. In other words, the response server device 10 can actively participate in the conference via the speaker device 20, thereby improving the efficiency of the conference.

また、分析部１７は、例えば感情を分析した結果に基づいて会議の雰囲気に適当な音楽を出力する。例えば、会議全体につきマイナスの感情、例えば「嫌悪」「絶望」「落胆」などの感情の割合が相対的に多い場合は心が明るくなるようなジャズ音楽に関する情報をスピーカ装置２０に送信し、例えば「怒り」などの感情の割合が相対的に多い場合は心が落ち着くようなクラシック音楽に関する情報をスピーカ装置２０に送信するよう、処理を実行させる。これにより会議の内容を改善する動機をユーザに与える。すなわち応答サーバ装置１０はスピーカ装置２０を介して会議の雰囲気を能動的に改善するよう機能し、会議の効率化を図ることができる。 Also, the analysis unit 17 outputs music suitable for the atmosphere of the conference, for example, based on the result of analyzing the emotions. For example, if negative emotions such as ``disgust'', ``despair'', and ``disappointment'' are present in a relatively large proportion of the whole meeting, information about jazz music that brightens the mind is transmitted to the speaker device 20, for example, If the ratio of emotions such as "anger" is relatively high, processing is executed so as to transmit information about calming classical music to the speaker device 20.例文帳に追加This gives the user motivation to improve the content of the conference. In other words, the response server device 10 functions to actively improve the atmosphere of the conference through the speaker device 20, and the efficiency of the conference can be improved.

また、分析部１７は、例えば、ユーザ情報テーブル１１ｂを参照して、ユーザの音声情報を分析することで、役職の高いユーザの発言が相対的に多いと分析された場合、前述の応答内容特定部１６において、役職の高いユーザの発言を控える応答内容を特定し、該応答内容に基づき生成される音声情報をスピーカ装置２０に送信するよう、処理を実行させる。これにより会議を活発化し多様な意見を抽出できる。 For example, the analysis unit 17 refers to the user information table 11b and analyzes the voice information of the user. In the unit 16 , processing is executed so as to specify the content of a response that refrains from speaking from a user with a higher position, and to transmit voice information generated based on the content of the response to the speaker device 20 . This makes it possible to revitalize the meeting and extract diverse opinions.

送信部１８は、音声情報など各種情報をスピーカ装置２０に送信する。
＜＜スピーカ装置２０＞＞ The transmission unit 18 transmits various information such as voice information to the speaker device 20 .
<<speaker device 20>>

次に、図１０を参照して、スピーカ装置２０の機能構成について説明する。図１０は、スピーカ装置２０の機能構成の一例を示す図である。図１０に示すとおり、スピーカ装置２０は、送受信部２１、表示制御部２２、およびサポート部２３の機能を有する。スピーカ装置２０が有する機能は、図１３に示す、スピーカ装置２０の制御部（プロセッサ１０１）が、記憶装置１０３に記憶されたコンピュータプログラムを読み込み、実行し、スピーカ装置２０の各構成を制御すること等により実現される。 Next, referring to FIG. 10, the functional configuration of the speaker device 20 will be described. FIG. 10 is a diagram showing an example of the functional configuration of the speaker device 20. As shown in FIG. As shown in FIG. 10 , the speaker device 20 has functions of a transmission/reception section 21 , a display control section 22 and a support section 23 . The function of the speaker device 20 is that the control unit (processor 101) of the speaker device 20 shown in FIG. etc.

送受信部２１は、スピーカ装置２０におけるデータの送受信を制御する。例えば、送受信部２１は、上述したマイクロフォンに入力された音声に関する音声情報を応答サーバ装置１０などの外部装置に送信する。また、送受信部２１は、応答サーバ装置１０などの外部装置からの各種情報を受信する。 The transmission/reception unit 21 controls transmission/reception of data in the speaker device 20 . For example, the transmitting/receiving unit 21 transmits voice information related to the voice input to the microphone described above to an external device such as the response server device 10 . Also, the transmission/reception unit 21 receives various information from an external device such as the response server device 10 .

表示制御部２２は、スピーカ装置２０が備える、またはスピーカ装置２０に接続された表示装置１０７（表示装置３００）の表示を制御する。例えば、表示制御部２２は、管理者の設定操作に必要な各種の画面（ユーザインタフェース）を生成し、表示装置１０７へ表示することを制御する。 The display control unit 22 controls the display of the display device 107 (display device 300 ) provided in the speaker device 20 or connected to the speaker device 20 . For example, the display control unit 22 generates various screens (user interfaces) necessary for the administrator's setting operation, and controls display on the display device 107 .

サポート部２３は、上述したサポートボタンが押下されたことを契機に、スピーカ装置２０に不具合が生じたことを示すサポート情報を生成する。サポート部２３は、送受信部２１を介してサポート情報をサポートセンター（不図示）に送信する。これにより、スピーカ装置２０に不具合が生じた場合、管理者が迅速に障害対応できる。
＜＜ユーザ端末装置３０＞＞ The support unit 23 generates support information indicating that a problem has occurred in the speaker device 20 when the above-described support button is pressed. The support unit 23 transmits support information to a support center (not shown) via the transmission/reception unit 21 . As a result, when a problem occurs in the speaker device 20, the administrator can quickly deal with the problem.
<<User terminal device 30>>

次に、図１１を参照して、ユーザ端末装置３０の機能構成について説明する。図１１は、ユーザ端末装置３０の機能構成の一例を示す図である。図１１に示すとおり、ユーザ端末装置３０は、入力部３１、送受信部３２、および表示制御部３３の機能を有する。ユーザ端末装置３０が有する機能は、図１３に示す、ユーザ端末装置３０の制御部（プロセッサ１０１）が、記憶装置１０３に記憶されたコンピュータプログラムを読み込み、実行し、ユーザ端末装置３０の各構成を制御すること等により実現される。 Next, referring to FIG. 11, the functional configuration of the user terminal device 30 will be described. FIG. 11 is a diagram showing an example of the functional configuration of the user terminal device 30. As shown in FIG. As shown in FIG. 11 , the user terminal device 30 has the functions of an input section 31 , a transmission/reception section 32 and a display control section 33 . 13, the control unit (processor 101) of the user terminal device 30 reads and executes a computer program stored in the storage device 103, and each component of the user terminal device 30 is configured. It is realized by controlling.

入力部３１は、ユーザによるユーザ端末装置３０に対する操作に応じて各種の情報を受け付ける。例えば、入力部３１は、ユーザによる操作に応じて、応答サーバ装置１０にアクセスするための入力を受け付ける。 The input unit 31 receives various types of information according to the user's operation on the user terminal device 30 . For example, the input unit 31 receives an input for accessing the response server device 10 according to the user's operation.

送受信部３２は、ユーザ端末装置３０におけるデータの送受信を制御する。例えば、送受信部３２は、入力部３１により入力された情報を応答サーバ装置１０などの外部装置に送信する。また、送受信部３２は、応答サーバ装置１０などの外部装置からの通知や各種情報を受信する。 The transmission/reception unit 32 controls transmission/reception of data in the user terminal device 30 . For example, the transmission/reception unit 32 transmits information input by the input unit 31 to an external device such as the response server device 10 . The transmission/reception unit 32 also receives notifications and various information from external devices such as the response server device 10 .

表示制御部３３は、ユーザ端末装置３０が備える、またはユーザ端末装置３０に接続された表示装置１０７の表示を制御する。例えば、表示制御部３３は、管理者の各種操作に必要な各種の画面（ユーザインタフェース）を生成し、表示装置１０７へ表示することを制御する。
＝＝動作フロー＝＝ The display control unit 33 controls the display of the display device 107 provided in the user terminal device 30 or connected to the user terminal device 30 . For example, the display control unit 33 generates various screens (user interfaces) required for various operations by the administrator and controls display on the display device 107 .
== Operation flow ==

図１２は、応答サーバ装置１０の処理の一例を示すフロー図である。図１２を参照して、応答サーバ装置１０により実行される処理の一例を説明する。 FIG. 12 is a flowchart showing an example of processing of the response server device 10. As shown in FIG. An example of processing executed by the response server device 10 will be described with reference to FIG.

まず、Ｓ１００において、応答サーバ装置１０はスピーカ装置２０からスピーカ情報を取得する。 First, in S100 , the response server device 10 acquires speaker information from the speaker device 20 .

次に、Ｓ１０１において、応答サーバ装置１０はユーザ端末装置３０またはスピーカ装置２０からユーザ情報を取得する。これにより応答サーバ装置１０はユーザがいずれの会議室にいるか認識できる。 Next, in S101 , the response server device 10 acquires user information from the user terminal device 30 or the speaker device 20 . As a result, the response server device 10 can recognize which conference room the user is in.

次に、Ｓ１０２において、応答サーバ装置１０はユーザ情報に基づき予約管理システムから予約情報を取得する。これにより応答サーバ装置１０はユーザが存在する会議室の予約状況を認識できる。 Next, in S102, the response server device 10 acquires reservation information from the reservation management system based on the user information. Thereby, the response server device 10 can recognize the reservation status of the conference room where the user is present.

次に、Ｓ１０３において、応答サーバ装置１０は、予約情報に基づいて、会議目的に応じた応答内容を特定する。特定された応答内容に基づき、議事録情報テーブル１１ｃまたは画像情報テーブル１１ｄを参照して、会議目的に応じた議事録情報または画像情報を特定する。 Next, in S103, the response server device 10 specifies response content according to the purpose of the conference based on the reservation information. Based on the content of the specified response, the minutes information table 11c or the image information table 11d is referenced to specify minutes information or image information according to the purpose of the meeting.

次に、Ｓ１０４において、応答サーバ装置１０は、応答内容テーブル１１ｅを参照して、ユーザ情報に基づく応答内容を特定する。 Next, in S104, the response server device 10 refers to the response content table 11e and specifies response content based on the user information.

次に、Ｓ１０５において、応答サーバ装置１０は、特定された応答内容に関する音声情報をスピーカ装置２０に送信する。例えば、スピーカ装置２０を介して、会議の冒頭に参加者の氏名を読み上げて出席確認をすることや、会議の目的を音声通知することなどを実行する。これによりスピーカ装置２０からユーザに対して能動的に発話するため、会議を円滑に進行できる。 Next, in S105 , the response server device 10 transmits audio information regarding the identified response content to the speaker device 20 . For example, through the speaker device 20, at the beginning of the conference, the name of the participant is read aloud to confirm attendance, or the purpose of the conference is notified by voice. As a result, since the speaker device 20 actively speaks to the user, the conference can proceed smoothly.

次に、Ｓ１０６において、応答サーバ装置１０は、議事録情報と画像情報とをスピーカ装置２０に送信する。スピーカ装置２０は取得した議事録情報と画像情報を表示装置３００に出力する。例えば、スピーカ装置２０を介して表示装置３００に前回の議事録を表示し、会議の目的に応じた写真やグラフなどを表示する。これにより応答サーバ装置１０は会議を円滑に進行するために過去の議事録と会議に要する画像をユーザに提供できる。 Next, in S106 , the response server device 10 transmits the minutes information and the image information to the speaker device 20 . The speaker device 20 outputs the acquired minutes information and image information to the display device 300 . For example, the minutes of the previous meeting are displayed on the display device 300 via the speaker device 20, and photographs, graphs, and the like are displayed according to the purpose of the meeting. As a result, the response server device 10 can provide the user with the minutes of past meetings and images required for the meeting in order to facilitate the meeting.

次に、Ｓ１０７において、応答サーバ装置１０は会議が終了したか否かを判定する。具体的に述べると、応答サーバ装置１０は所定の時間になると会議が終了されたか否かを問いかける音声情報をスピーカ装置２０に送信する。例えば、スピーカ装置２０から「会議時間が終了します。延長しますか？」という音声を出力する。スピーカ装置２０から出力された音声に応じてユーザからの応答を示す音声情報を、スピーカ装置２０を介して取得する。例えば、スピーカ装置２０を介して、ユーザから「３０分延長してください」という音声情報を取得すると、応答サーバ装置１０は送信部１８を介して予約管理システムに対して会議室の予約時間を３０分延長する延長情報を送信する。その後、応答サーバ装置１０は、ユーザの音声情報に基づいて、応答内容テーブル１１ｅを参照して、会議が終了したか否かを判定する。 Next, in S107, the response server device 10 determines whether or not the conference has ended. Specifically, the response server device 10 transmits to the speaker device 20 voice information asking whether the conference has ended at a predetermined time. For example, the speaker device 20 outputs a voice saying, "The meeting time is over. Do you want to extend it?" Audio information indicating a response from the user in accordance with the audio output from the speaker device 20 is acquired via the speaker device 20 . For example, when voice information "please extend 30 minutes" is obtained from the user via the speaker device 20, the response server device 10 sends the reservation management system via the transmission unit 18 to the reservation time of the conference room of 30 minutes. Send extension information to extend by minutes. After that, the response server device 10 refers to the response content table 11e based on the voice information of the user and determines whether or not the conference has ended.

会議が終了したと判定した場合（Ｓ１０７：ＮＯ）、Ｓ１０４から処理を繰り返す。これによりユーザの音声情報に応じた応答内容を特定する。 If it is determined that the conference has ended (S107: NO), the process is repeated from S104. Thereby, the content of the response corresponding to the user's voice information is specified.

会議が終了したと判定した場合（Ｓ１０７：ＹＥＳ）、会議内容を分析する（Ｓ１０８）。分析結果をスピーカ装置２０に送信する。これによりユーザそれぞれが会議において改善すべき点を把握できる。例えば、「本日の会議は７０点です」や「Ｂ部長話しすぎです」という音声を出力する。
＝＝音声通知システム１のハードウェア構成＝＝ If it is determined that the conference has ended (S107: YES), the content of the conference is analyzed (S108). The analysis result is transmitted to the speaker device 20 . This allows each user to grasp points to be improved in the conference. For example, it outputs voices such as "Today's meeting is 70 points" or "Manager B talks too much".
==Hardware Configuration of Voice Notification System 1==

図１３を参照して、応答サーバ装置１０、スピーカ装置２０およびユーザ端末装置３０をコンピュータ１００により実現する場合のハードウェア構成の一例を説明する。なお、それぞれの装置の機能は、複数台の装置に分けて実現することもできる。また、スピーカ装置２０における一部のハードウェア構成については上述したとおりである。 An example of a hardware configuration when the response server device 10, the speaker device 20 and the user terminal device 30 are implemented by the computer 100 will be described with reference to FIG. The function of each device can also be implemented by being divided into a plurality of devices. Part of the hardware configuration of the speaker device 20 is as described above.

図１３は、コンピュータのハードウェア構成の一例を示す図である。図１３に示すように、コンピュータ１００は、プロセッサ１０１と、メモリ１０２と、記憶装置１０３と、入力Ｉ／Ｆ部１０４と、データＩ／Ｆ部１０５と、通信Ｉ／Ｆ部１０６、及び表示装置１０７を含む。 FIG. 13 is a diagram illustrating an example of the hardware configuration of a computer; As shown in FIG. 13, the computer 100 includes a processor 101, a memory 102, a storage device 103, an input I/F section 104, a data I/F section 105, a communication I/F section 106, and a display device. 107.

プロセッサ１０１は、メモリ１０２に記憶されているプログラムを実行することによりコンピュータ１００における各種の処理を制御する制御部である。 The processor 101 is a control unit that controls various processes in the computer 100 by executing programs stored in the memory 102 .

メモリ１０２は、例えばＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）等の記憶媒体である。メモリ１０２は、プロセッサ１０１によって実行されるプログラムのプログラムコードや、プログラムの実行時に必要となるデータを一時的に記憶する。 The memory 102 is a storage medium such as a RAM (Random Access Memory). The memory 102 temporarily stores program codes of programs executed by the processor 101 and data necessary for executing the programs.

記憶装置１０３は、例えばハードディスクドライブ（ＨＤＤ）やフラッシュメモリ等の不揮発性の記憶媒体である。記憶装置１０３は、オペレーティングシステムや、上記各構成を実現するための各種プログラムを記憶する。 The storage device 103 is a non-volatile storage medium such as a hard disk drive (HDD) or flash memory. The storage device 103 stores an operating system and various programs for realizing each configuration described above.

入力Ｉ／Ｆ部１０４は、ユーザからの入力を受け付けるためのデバイスである。入力Ｉ／Ｆ部１０４の具体例としては、キーボードやマウス、タッチパネル、各種センサ、ウェアラブル・デバイス等が挙げられる。入力Ｉ／Ｆ部１０４は、例えばＵＳＢ（ＵｎｉｖｅｒｓａｌＳｅｒｉａｌＢｕｓ）等のインタフェースを介してコンピュータ１００に接続されても良い。 The input I/F unit 104 is a device for receiving input from the user. Specific examples of the input I/F unit 104 include a keyboard, mouse, touch panel, various sensors, wearable devices, and the like. The input I/F unit 104 may be connected to the computer 100 via an interface such as USB (Universal Serial Bus).

データＩ／Ｆ部１０５は、コンピュータ１００の外部からデータを入力するためのデバイスである。データＩ／Ｆ部１０５の具体例としては、各種記憶媒体に記憶されているデータを読み取るためのドライブ装置等がある。データＩ／Ｆ部１０５は、コンピュータ１００の外部に設けられることも考えられる。その場合、データＩ／Ｆ部１０５は、例えばＵＳＢ等のインタフェースを介してコンピュータ１００へと接続される。 A data I/F unit 105 is a device for inputting data from outside the computer 100 . A specific example of the data I/F unit 105 is a drive device for reading data stored in various storage media. Data I/F unit 105 may be provided outside computer 100 . In that case, the data I/F unit 105 is connected to the computer 100 via an interface such as USB.

通信Ｉ／Ｆ部１０６は、コンピュータ１００の外部の装置と有線又は無線により、インターネットＮを介したデータ通信を行うためのデバイスである。通信Ｉ／Ｆ部１０６は、コンピュータ１００の外部に設けられることも考えられる。その場合、通信Ｉ／Ｆ部１０６は、例えばＵＳＢ等のインタフェースを介してコンピュータ１００に接続される。 The communication I/F unit 106 is a device for performing data communication with a device external to the computer 100 via the Internet N by wire or wirelessly. Communication I/F section 106 may be provided outside computer 100 . In that case, the communication I/F unit 106 is connected to the computer 100 via an interface such as USB.

表示装置１０７は、各種情報を表示するためのデバイスである。表示装置１０７の具体例としては、例えば液晶ディスプレイや有機ＥＬ（Ｅｌｅｃｔｒｏ－Ｌｕｍｉｎｅｓｃｅｎｃｅ）ディスプレイ、ウェアラブル・デバイスのディスプレイ等が挙げられる。表示装置１０７は、コンピュータ１００の外部に設けられても良い。その場合、表示装置１０７は、例えばディスプレイケーブル等を介してコンピュータ１００に接続される。また、入力Ｉ／Ｆ部１０４としてタッチパネルが採用される場合には、表示装置１０７は、入力Ｉ／Ｆ部１０４と一体化して構成することが可能である。
＝＝他の実施形態＝＝ The display device 107 is a device for displaying various information. Specific examples of the display device 107 include a liquid crystal display, an organic EL (Electro-Luminescence) display, and a wearable device display. The display device 107 may be provided outside the computer 100 . In that case, the display device 107 is connected to the computer 100 via, for example, a display cable. Further, when a touch panel is adopted as the input I/F section 104 , the display device 107 can be integrated with the input I/F section 104 .
==Other Embodiments==

応答内容特定部１６は、例えば会議室の予約管理を行ってもよい。例えば、応答内容特定部１６は、予約情報に基づいて、応答内容テーブル１１ｅを参照して、スピーカ装置２０を介して会議室にユーザがいるか否かを問いかける応答内容を特定する。これにつき、図１４を参照して具体的に述べると、まず、応答サーバ装置１０は予約管理システムから予約情報を取得する（Ｓ２００）。次に、応答サーバ装置１０は、現時点において会議室が予約されているか否かを判定する（Ｓ２０１）。会議室が予約されていると判定した場合（Ｓ２０１：ＹＥＳ）、スピーカ装置２０から音声情報を取得することで、会議室に人がいるか否かを判定する（Ｓ２０２）。会議室に人がいると判定した場合（Ｓ２０２：ＹＥＳ）、応答内容特定部１６は、予約情報に基づいて、応答内容テーブル１１ｅを参照して、会議室を予約した人の氏名を問いかける応答内容（「あなたは誰ですか？」など）を特定し、送信部１８を介して該応答内容に関する音声情報をスピーカ装置２０に送信する（Ｓ２０３）。一方、会議室に人がいないと判定した場合（Ｓ２０２：ＮＯ）、応答内容特定部１６は誰かがいるかを問いかける応答内容（「誰かいますか？」など）を特定し、送信部１８を介して該応答内容に関する音声情報をスピーカ装置２０に送信する。ここで、スピーカ装置２０からの問いかけに対して返答がなかった場合、予約管理システムの予約をキャンセルする（Ｓ２０４）。また、会議室が予約されていない場合（Ｓ２０１：ＮＯ）、スピーカ装置２０から音声情報を取得することで、会議室に人がいるか否かを判定する（Ｓ２０５）。会議室に人がいると判定した場合（Ｓ２０５：ＹＥＳ）、応答内容特定部１６は予約するか否かを問いかける応答内容（「予約しますか？」など）を特定し、送信部１８を介して該応答内容に関する音声情報をスピーカ装置２０に送信する（Ｓ２０６）。これにより応答サーバ装置１０は会議室の予約状況に対して能動的に機能するため、会議室の有効活用を図ることができる。 The response content specifying unit 16 may perform reservation management of conference rooms, for example. For example, the response content specifying unit 16 refers to the response content table 11e based on the reservation information, and specifies the response content asking whether the user is present in the conference room via the speaker device 20. FIG. Specifically, referring to FIG. 14, the response server device 10 first acquires reservation information from the reservation management system (S200). Next, the response server device 10 determines whether or not the conference room is currently reserved (S201). If it is determined that the conference room is reserved (S201: YES), it is determined whether or not there is a person in the conference room by acquiring voice information from the speaker device 20 (S202). If it is determined that there is a person in the conference room (S202: YES), the response content identification unit 16 refers to the response content table 11e based on the reservation information, and asks for the name of the person who reserved the conference room. ("Who are you?", etc.), and transmits voice information regarding the content of the response to the speaker device 20 via the transmission unit 18 (S203). On the other hand, if it is determined that there is no one in the conference room (S202: NO), the response content specifying unit 16 specifies the content of the response asking whether someone is present (such as "Is there anyone?" Audio information about the contents of the response is transmitted to the speaker device 20 . Here, if there is no response to the question from the speaker device 20, the reservation of the reservation management system is canceled (S204). Also, if the conference room is not reserved (S201: NO), it is determined whether or not there is a person in the conference room by acquiring voice information from the speaker device 20 (S205). If it is determined that there are people in the conference room (S205: YES), the response content specifying unit 16 specifies the content of the response asking whether or not to make a reservation (such as "Do you want to make a reservation?" Then, the voice information regarding the content of the response is transmitted to the speaker device 20 (S206). As a result, the response server device 10 actively functions according to the reservation status of the conference room, so that the effective use of the conference room can be achieved.

応答サーバ装置１０は、プレゼン特定実行部（不図示）の機能をさらに有していてもよい。プレゼン特定実行部は、例えばクラウド上で公開されているプレゼン資料データベース（不図示）を参照し、ユーザ情報に基づき応答内容特定部１６で特定された応答内容に基づいて、プレゼン資料を特定する。例えば、応答内容における所定のキーワードを特定し、該キーワードに関するプレゼン資料を特定する。プレゼン特定実行部は、送信部１８を介して、特定されたプレゼン資料に関する情報をスピーカ装置２０に送信するとともに、プレゼン資料に記載されているテキスト情報を音声情報に変換し、該音声情報をスピーカ装置２０に送信する。これにより応答サーバ装置１０はスピーカ装置２０を介してユーザに対して能動的にプレゼン資料を提供するとともに、プレゼン資料の内容につき音声案内することができるため、会議の効率化を図ることができる。 The response server device 10 may further have the function of a presentation specific execution unit (not shown). The presentation identification execution unit refers to, for example, a presentation material database (not shown) that is open to the public on the cloud, and identifies the presentation material based on the response content identified by the response content identification unit 16 based on the user information. For example, a predetermined keyword in the response content is specified, and the presentation materials related to the keyword are specified. The presentation identification execution unit transmits information about the identified presentation material to the speaker device 20 via the transmission unit 18, converts the text information described in the presentation material into voice information, and transmits the voice information to the speaker. Send to device 20 . As a result, the response server device 10 can actively provide the presentation material to the user via the speaker device 20, and can also provide voice guidance regarding the contents of the presentation material, thereby improving the efficiency of the conference.

また、プレゼン特定実行部（不図示）は、予約情報に基づいて、会議室の利用目的におけるキーワードを特定し、該キーワードに関するプレゼン資料を特定してもよい。プレゼン特定実行部は、送信部１８を介して特定されたプレゼン資料に関する情報をスピーカ装置２０に送信する。そして、プレゼン資料に記載されているテキスト情報を音声情報に変換し、送信部１８を介して該音声情報をスピーカ装置２０に送信する。これにより応答サーバ装置１０において会議の目的に適したプレゼン資料を自動的に特定されるため、会議を円滑に進行できる。 Also, the presentation identification execution unit (not shown) may identify a keyword for the purpose of using the conference room based on the reservation information, and identify the presentation materials related to the keyword. The presentation identification execution unit transmits information about the identified presentation material to the speaker device 20 via the transmission unit 18 . Then, the text information described in the presentation material is converted into voice information, and the voice information is transmitted to the speaker device 20 via the transmission unit 18 . As a result, presentation materials suitable for the purpose of the conference are automatically specified in the response server device 10, so that the conference can proceed smoothly.

応答内容特定部１６は、例えばスピーカ装置２０を介してプレゼン実行中におけるユーザの質問に応答する応答内容を特定する機能を有していてもよい。まず、応答内容特定部１６は質問に関する音声情報をテキスト情報に変換する。次に、応答内容特定部１６は該テキスト情報に対応する応答内容を応答内容テーブル１１ｅから取得する。具体的に述べると、例えばテキスト情報を解析してキーワードを抽出する。抽出されたキーワードに基づいて、応答内容テーブル１１ｅを検索するとともに、質問の種別（人名、地名、数量など）を特定する。検索により応答内容テーブル１１ｅから抽出された応答内容の中から、特定された質問の種別に対応する言葉を特定することで、回答を特定する。これにより応答サーバ装置１０は能動的にプレゼンを実行するとともに、ユーザの質問に対する回答をも行うため会議を円滑に進行できる。 The response content specifying unit 16 may have a function of specifying the response content to the user's question during the presentation through the speaker device 20, for example. First, the response content identification unit 16 converts the voice information regarding the question into text information. Next, the response content specifying unit 16 acquires the response content corresponding to the text information from the response content table 11e. Specifically, for example, text information is analyzed to extract keywords. Based on the extracted keyword, the response content table 11e is searched, and the type of question (person's name, place name, quantity, etc.) is specified. An answer is specified by specifying words corresponding to the specified question type from among the response contents extracted from the response content table 11e by searching. As a result, the response server device 10 actively executes the presentation and also answers the user's questions, so that the conference can proceed smoothly.

応答サーバ装置１０は、例えば会議の議事録を作成する議事録作成部（不図示）の機能をさらに有していてもよい。議事録作成部は、例えばスピーカ装置２０を介して取得したユーザの音声情報をテキスト情報に変換し、所定の様式に議事録として該テキスト情報を入力する。テキスト情報が入力された所定の様式を示す情報をユーザ端末装置３０に送信するよう、処理を実行させる。これによりユーザにおいて議事録作成にかかる作業を軽減できる。 The response server device 10 may further have a function of a minutes creation unit (not shown) that creates minutes of a conference, for example. The minutes creation unit converts, for example, the user's voice information acquired via the speaker device 20 into text information, and inputs the text information as minutes in a predetermined format. Processing is executed so as to transmit to the user terminal device 30 information indicating a predetermined format in which text information is input. As a result, it is possible to reduce the work required for the user to create the minutes.

応答サーバ装置１０は、例えばユーザ端末装置３０からユーザ情報を取得することに代えてスピーカ装置２０で取得するユーザの音声をユーザ情報として取得してもよい。この場合、応答サーバ装置１０はユーザごとの声紋に関する声紋情報を格納する声紋情報データベース（不図示）を備え、取得した音声情報と声紋情報とを照合してユーザを特定する。これによりユーザ端末装置３０を所持していないユーザを認識できるため、音声通知システム１の確実な運用を実現できる。 For example, instead of acquiring user information from the user terminal device 30, the response server device 10 may acquire the user's voice acquired by the speaker device 20 as user information. In this case, the response server device 10 has a voiceprint information database (not shown) that stores voiceprint information relating to the voiceprint of each user, and identifies the user by comparing the acquired voice information and the voiceprint information. As a result, it is possible to recognize a user who does not have the user terminal device 30, so that the sound notification system 1 can be reliably operated.

応答サーバ装置１０は、例えばユーザ端末装置３０からユーザ情報を取得することに代えてカメラ装置（不図示）から取得するユーザの顔に関する顔情報をユーザ情報として取得してもよい。この場合、応答サーバ装置１０はユーザごとの顔情報を格納する顔情報データベース（不図示）を備え、取得した顔情報と顔情報データベースに格納されている顔情報とを照合してユーザを特定する。これによりユーザ端末装置３０を所持していないユーザを認識できるため、音声通知システム１の確実な運用を実現できる。 For example, instead of acquiring user information from the user terminal device 30, the response server device 10 may acquire face information about the user's face from a camera device (not shown) as user information. In this case, the response server device 10 has a face information database (not shown) that stores face information for each user, and identifies the user by comparing the acquired face information with the face information stored in the face information database. . As a result, it is possible to recognize a user who does not have the user terminal device 30, so that the sound notification system 1 can be reliably operated.

応答サーバ装置１０は、例えば各種機能を実行するタイミングを計る計時部（不図示）をさらに有していてもよい。計時部において所定の時間や所定の時間経過を計ることで、応答サーバ装置１０は所定の時間に所定の機能を実行し所定の時間経過時に所定の機能を実行することができる。これにより応答サーバ装置１０は適切なタイミングで能動的に機能を発揮できるため、会議を円滑に進行できる。 The response server device 10 may further include, for example, a timer (not shown) that measures the timing of executing various functions. The response server device 10 can execute a predetermined function at a predetermined time and execute a predetermined function after the predetermined time has elapsed by measuring a predetermined time and the passage of the predetermined time in the timer unit. As a result, the response server device 10 can actively perform its function at an appropriate timing, so that the conference can proceed smoothly.

上記において、スピーカ装置２０は会議室に設置されているものとして説明したがこれに限定されない。スピーカ装置２０を不特定多数のユーザが利用する場所に設置できる。これにより、音声通知システム１は、例えば、個人宅、集合住宅のエントランス、お店、集会場など様々な場所において、能動的にユーザに対して発話する。様々な場所でユーザに対して能動的に音声出力することで、ユーザの発言を促すことができる。 In the above description, the speaker device 20 has been described as being installed in the conference room, but the present invention is not limited to this. The speaker device 20 can be installed in a place used by an unspecified number of users. Thereby, the voice notification system 1 actively speaks to the user at various places such as a private house, an entrance of a collective housing, a shop, a meeting place, and the like. By actively outputting voice to the user at various places, it is possible to encourage the user to speak.

なお、述した実施の形態は本発明の理解を容易にするためのものであり、本発明を限定して解釈するためのものではない。本発明はその趣旨を逸脱することなく変更、改良され得るとともに、本発明にはその等価物も含まれる。 It should be noted that the above-described embodiment is for facilitating understanding of the present invention, and is not for limiting interpretation of the present invention. The present invention may be modified and improved without departing from its spirit, and the present invention also includes equivalents thereof.

１…音声通知システム、１０…応答サーバ装置、１１…記憶部、１２ａ…スピーカ情報取得部、１２ｂ…ユーザ情報取得部、１２ｃ…予約情報取得部、１３…認識部、１４…議事録特定部、１５…画像特定部、１６…応答内容特定部、１７…分析部、１８…送信部、２０…スピーカ装置、３０…ユーザ端末装置 DESCRIPTION OF SYMBOLS 1... Voice notification system 10... Response server apparatus 11... Storage part 12a... Speaker information acquisition part 12b... User information acquisition part 12c... Reservation information acquisition part 13... Recognition part 14... Minutes identification part 15... Image identification unit 16... Response content identification unit 17... Analysis unit 18... Transmission unit 20... Speaker device 30... User terminal device

Claims

User information that can identify a user who has accessed the speaker device is acquired from a speaker device having a microphone and a speaker when the user enters a conference room, and the conference room is reserved based on the user information. an acquisition unit that acquires reservation information regarding the reservation status of the conference room from a reservation management system that manages the
based on the user information and the reservation information acquired by the acquisition unit, specifying that users scheduled to participate in the conference in the conference room have gathered, and when it is specified that the users have gathered, the conference will be held; a response content identification unit that identifies the content of the response indicating that the name of the participant is to be confirmed ;
a transmitting unit configured to transmit audio information based on the response content specified by the response content specifying unit to the speaker device so as to cause the speaker device to output voice along with the response content;
An information processing system comprising

The response content identifying unit is configured to inquire whether or not to extend the usage time of the conference room at a time when it is identified that the meeting ends at a predetermined time before the end time of the conference room. identify the
The information processing system according to claim 1 .

The acquisition unit acquires the user information related to the user's voice from the speaker device,
The information processing system is
Analysis for analyzing emotions included in speeches of users participating in the conference in the conference room based on user information about the user's voice, and giving points to the conference in the conference room based on the analyzed emotions. further comprising the
The transmission unit transmits analysis information indicating a score, which is the result of analysis by the analysis unit, to the speaker device.
3. The information processing system according to claim 1 , wherein:

The response content identification unit identifies image response content, which is response content indicating that a predetermined image is to be used, based on the reservation information,
The information processing system is
further comprising an image specifying unit that specifies the predetermined image based on the image response content specified by the response content specifying unit;
The transmitting unit transmits image information regarding the predetermined image to the speaker device so as to output the predetermined image to a display device connected to the speaker device.
4. The information processing system according to any one of claims 1 to 3 , characterized by:

The response content identification unit identifies, based on the reservation information, minutes response content, which is response content indicating use of past minutes,
The information processing system is
Further comprising a minutes identification unit that identifies predetermined minutes information from a past database holding minutes information related to the past minutes based on the response content identified by the response content identification unit,
The transmission unit transmits the minutes information to the speaker device so as to output the minutes to a display device connected to the speaker device.
5. The information processing system according to any one of claims 1 to 4 , characterized by:

The response content specifying unit specifies presentation response content, which is response content indicating that a predetermined presentation material is to be used, based on the reservation information,
a presentation identification execution unit that identifies presentation material information indicating presentation materials from a predetermined database based on the content of the presentation response;
further comprising
The transmission unit transmits the presentation material information so that the speaker device outputs the presentation material to a display device connected to the speaker device.
6. The information processing system according to any one of claims 1 to 5 , characterized by:

The information processing system according to any one of claims 1 to 6 , further comprising the speaker device.

the computer
User information that can identify a user who has accessed the speaker device is acquired from a speaker device having a microphone and a speaker when the user enters a conference room, and the conference room is reserved based on the user information. an acquisition step of acquiring reservation information regarding the reservation status of the conference room from a reservation management system that manages the
Based on the user information and the reservation information acquired in the acquiring step, it is specified that users scheduled to participate in the conference in the conference room have gathered, and when it is specified that the users have gathered, the conference is held. a response content identification step of identifying response content indicating confirmation of the name of the participant ;
a transmission step of transmitting audio information based on the response content specified in the response content specifying step to the speaker device so as to cause the speaker device to output voice along with the response content;
Information processing method that performs

to the computer,
User information that can identify a user who has accessed the speaker device is acquired from a speaker device having a microphone and a speaker when the user enters a conference room, and the conference room is reserved based on the user information. Acquiring reservation information regarding the reservation status of the conference room from a reservation management system that manages the
Based on the user information and the reservation information, it is specified that the users scheduled to participate in the conference in the conference room are present, and when it is specified that the users are present, the names of the participants of the conference are confirmed. specifying the content of the response indicating that
causing the speaker device to transmit audio information based on the identified response content in order to cause the speaker device to output audio along with the response content;
program to run.