JP7576991B2

JP7576991B2 - Data transmission/reception device and program equipped in robot

Info

Publication number: JP7576991B2
Application number: JP2021016019A
Authority: JP
Inventors: 正男山本; 敏西村
Original assignee: Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp
Priority date: 2021-02-03
Filing date: 2021-02-03
Publication date: 2024-11-01
Anticipated expiration: 2041-02-03
Also published as: JP2022119055A

Description

本発明は、発話を行うロボットに備えたデータ送受信装置及びプログラムに関する。 The present invention relates to a data transmission/reception device and program for a robot that can speak.

従来、人と一緒にテレビを視聴するテレビ視聴ロボットが知られている（例えば、非特許文献１を参照）。このテレビ視聴ロボットは、周辺のテレビ及び人を検出し、番組に関連する発話文を生成して発話する。これにより、人は、テレビ視聴ロボットの発話を聞くことで、一人でテレビを視聴している場合であっても、複数人で視聴しているかのようにテレビ視聴を楽しむことができる。 Conventionally, a television-watching robot that watches television together with a person is known (see, for example, Non-Patent Document 1). This television-watching robot detects nearby televisions and people, and generates and speaks utterances related to the program. In this way, by listening to the television-watching robot's utterances, a person can enjoy watching television as if they were watching with multiple people, even if they are watching alone.

また、テレビ視聴ロボット（以下、「ロボット」という。）の発話に伴い、人同士の会話の活性化効果及びテレビ視聴への影響等を検証するためのロボット開発が進められている（例えば、非特許文献２を参照）。これは、「人が２人及びロボットが１体」の組み合わせでテレビを視聴している状態において、ロボットが番組に関連するキーワードを発話することで、検証を行うものである。 In addition, a television viewing robot (hereafter referred to as "robot") is being developed to verify the effect of speech on stimulating conversations between people and the impact on television viewing (see, for example, Non-Patent Document 2). This verification is carried out by having the robot speak keywords related to the program while watching television with a combination of "two people and one robot."

一方、新型コロナウイルス感染症の流行による外出自粛及びソーシャルディスタンスをとること等が新しい生活習慣として定着するに連れて、友人または知人等の複数の人がテレビを一緒に視聴する機会が減る傾向にあることが予想される。このため、前述の非特許文献２のロボット開発においては、「人が２人及びロボットが１体」の組み合わせの機会が減ることを考慮する必要がある。 On the other hand, as the COVID-19 pandemic leads people to refrain from going out and maintain social distance, it is expected that there will be fewer opportunities for multiple people, such as friends or acquaintances, to watch television together. For this reason, in developing the robot described in Non-Patent Document 2 above, it is necessary to take into account the reduced opportunities for the combination of "two people and one robot."

また、新しい生活習慣では、人同士のコミュニケーション手段として、Ｚｏｏｍ（登録商標）等のインターネットを介したリモート会議ツールが急速に普及している。 In addition, as part of our new lifestyle habits, internet-based remote conferencing tools such as Zoom (registered trademark) are rapidly becoming popular as a means of communication between people.

これらを考慮すると、「人が１人及びロボットが１体」の組み合わせと別の「人が１人及びロボットが１体」の組み合わせとのペアにおいて、人同士がインターネットを介して会話する環境形態を想定することができる。 Taking these factors into consideration, we can imagine an environment in which people communicate with each other over the Internet in pairs of "one human and one robot" and "one human and one robot."

具体的には、離れた場所にいる２つの組み合わせのペアにおいて、同じテレビ番組または動画を視聴しながら会話を楽しみ、さらに、ロボットが視聴中の番組内容等に関連する発話を行うことで、人同士の会話をさらに弾ませる状態を想定することができる。 Specifically, it is possible to imagine a situation in which two pairs of people in different locations enjoy a conversation while watching the same television program or video, and the robot makes utterances related to the content of the program being watched, further encouraging conversation between the people.

前述の非特許文献２には、ロボットに視聴中の番組内容に関する発話を行わせる手段が記載されている。この発話手段は、Microsoft Azure（ＡＺＵＲＥ（登録商標））またはAmazon Web Services（アマゾンウェブサービス（登録商標））等のクラウドサービスを用いて、視聴中の番組の映像に関するキャプション生成、有名人検索及び画像ラベル検出を並列に行う。そして、発話手段は、これらの結果に基づいて、予め用意したキーワード辞書に含まれる単語をキーワードとして出力する。 The aforementioned non-patent document 2 describes a means for making a robot speak about the contents of a program being watched. This speech means uses a cloud service such as Microsoft Azure (AZURE (registered trademark)) or Amazon Web Services (Amazon Web Services (registered trademark)) to perform caption generation, celebrity search, and image label detection in parallel for the video of the program being watched. Then, based on these results, the speech means outputs words contained in a keyword dictionary prepared in advance as keywords.

すなわち、ロボットからクラウドサービスへの上り回線には映像が送信され、クラウドサービスからロボットへの下り回線にはキャプション、有名人名及び画像ラベル等のテキスト情報（映像に関連するキーワード）が送信される。この場合、上り回線のトラフィック量は大きく、下り回線のトラフィックは小さいという特徴がある。 That is, video is transmitted on the upstream line from the robot to the cloud service, and text information such as captions, celebrity names, and image labels (keywords related to the video) is transmitted on the downstream line from the cloud service to the robot. In this case, the traffic volume on the upstream line is large, while the traffic on the downstream line is small.

星祐太、“テレビ視聴ロボット”、ＮＨＫ放送技術研究所、技研だより、２０１７年１０月号Yuta Hoshi, "TV Viewing Robot", NHK Science and Technology Research Laboratories, GiKen News, October 2017 issue 萩尾勇太、“人と一緒にテレビを視聴するコミュニケーションロボットの試作と検証”、電子情報通信学会技術研究報告、vol.119、no.446、BioX2019-63、CNR2019-46、2020、pp.7-12Yuta Hagio, "Prototype and Verification of a Communication Robot that Watches TV with People," IEICE Technical Report, vol.119, no.446, BioX2019-63, CNR2019-46, 2020, pp.7-12

前述の「人が１人及びロボットが１体」の組み合わせと別の「人が１人及びロボットが１体」の組み合わせとのペアからなる環境形態においては、各組み合わせは、同じ番組または動画を視聴する。このため、各組み合わせのロボットは、上り回線を用いてクラウドサービスに対し、同じ映像ファイルを送信する。これでは、クラウドサービスにかかるコストの観点から、無駄が生じることとなる。 In an environment consisting of a pair of the aforementioned combination of "one human and one robot" and another combination of "one human and one robot," each combination watches the same program or video. For this reason, the robots in each combination send the same video file to the cloud service using the upstream line. This creates waste in terms of the costs associated with the cloud service.

この問題を解決するために、一方のロボットが、映像ファイルをクラウドサービスへ送信してテキスト情報を受信し、受信したテキスト情報を他方のロボットへ送信する手法が想定される。これにより、クラウドサービスにかかるコストを半減することができる。 To solve this problem, one robot could send a video file to a cloud service, receive text information, and then send the received text information to the other robot. This could halve the cost of cloud services.

しかしながら、一方のロボットのインターネット接続環境と、他方のロボットのインターネット接続環境とが異なる場合には、両ロボット間でテキスト情報の共有が遅れ、視聴中の映像とロボットの発話内容とがずれてしまう可能性がある。 However, if the Internet connection environment of one robot is different from that of the other robot, there may be a delay in sharing text information between the two robots, which could result in a mismatch between the video being viewed and what the robot is saying.

例えば、一方のロボットが、映像ファイルをクラウドサービスへ送信してテキスト情報を受信し、テキスト情報を他方のロボットへ送信する状況において、一方のロボットに接続されたインターネットサービスプロバイダ等で輻輳が生じた場合を想定する。 For example, consider a situation in which one robot sends a video file to a cloud service, receives text information, and then sends the text information to another robot, and congestion occurs at an Internet service provider connected to one of the robots.

この場合、一方のロボットから送信された映像ファイルのクラウドサービスへの到着が遅れてしまい、一方のロボットによるテキスト情報の受信が遅れ、他方のロボットによる一方のロボットからのテキスト情報の受信も遅れてしまう。結果として、両ロボット間でテキスト情報の共有が遅れ、ロボットの発話のタイミングが遅れてしまう。 In this case, the video file sent from one robot will arrive late at the cloud service, which will cause a delay in the receipt of text information by one robot, and a delay in the receipt of text information from the other robot by the other robot. As a result, the sharing of text information between the two robots will be delayed, causing a delay in the timing of the robot's speech.

このロボットの発話内容は、人が以前の時間に視聴した映像に関するものであるため、発話のタイミングが遅れれば遅れるほど、視聴中の映像に対して的はずれとなってしまい、人同士の会話の活性化効果が得られ難いという課題があった。 The content of the robot's speech is related to the video that the person has previously viewed, so the later the timing of the speech, the more irrelevant it becomes to the video being viewed, making it difficult to stimulate conversation between people.

そこで、本発明は前記課題を解決するためになされたものであり、その目的は、クラウドサービスにかかるコストを低減すると共に、ロボットの発話のタイミングの遅れを抑制可能なデータ送受信装置及びプログラムを提供することにある。 The present invention has been made to solve the above problems, and its purpose is to provide a data transmission/reception device and program that can reduce the cost of cloud services and suppress delays in the timing of a robot's speech.

前記課題を解決するために、請求項１のデータ送受信装置は、第１の場所に、第１のディスプレイに表示された映像を視聴する第１のユーザ、及び前記映像に関連するキーワードに基づいた発話を行う第１のロボットが存在し、前記第１の場所とは異なる第２の場所に、第２のディスプレイに表示された前記映像を視聴する第２のユーザ、及び前記映像に関連する前記キーワードに基づいた発話を行う第２のロボットが存在し、前記第１のユーザ及び前記第２のユーザが同一の前記キーワードに基づいた発話を聞いて会話を行う際の、前記第１のロボットに備えたデータ送受信装置と、前記第２のロボットに備えた遠隔データ送受信装置と、クラウドサーバとがインターネットを介して接続されるデータ送受信システムの下で、前記データ送受信装置及び前記遠隔データ送受信装置のそれぞれが、前記映像のデータファイルを映像ファイルとして取得し、前記映像ファイルを前記クラウドサーバへ送信し、前記クラウドサーバから前記キーワードを受信する場合の前記データ送受信装置において、前記第１のユーザによる発声のデータファイルを第１の発声ファイルとして取得し、前記第１の発声ファイルを前記遠隔データ送受信装置へ送信し、前記遠隔データ送受信装置から、前記第２のユーザによる発声のデータファイルを第２の発声ファイルとして受信し、前記映像ファイルを前記クラウドサーバへ送信してから前記キーワードを受信するまでの間の、当該データ送受信装置における前記インターネットの混雑度合いを反映した時間情報Ｔａを算出し、前記時間情報Ｔａを前記遠隔データ送受信装置へ送信し、前記遠隔データ送受信装置から、前記遠隔データ送受信装置における前記インターネットの混雑度合いを反映した時間情報Ｔｂを受信し、前記時間情報Ｔａ及び前記時間情報Ｔｂに基づいて、当該データ送受信装置における前記インターネットの混雑度合いが前記遠隔データ送受信装置よりも高いと判断した場合、前記映像ファイルの前記クラウドサーバへの送信を停止し、前記遠隔データ送受信装置から、前記遠隔データ送受信装置が前記クラウドサーバから受信した前記キーワードを受信する送受信処理部と、前記送受信処理部により受信された前記第２の発声ファイルを再生する発声再生部と、前記送受信処理部により受信された前記キーワードに基づいた発話文を再生する発話文再生部と、を備えたことを特徴とする。 In order to solve the above problem, the data transmission/reception device of claim 1 is a data transmission/reception system in which a first user who watches a video displayed on a first display and a first robot who makes an utterance based on a keyword related to the video are present at a first location, and a second user who watches the video displayed on a second display and a second robot who makes an utterance based on the keyword related to the video are present at a second location different from the first location, and the first user and the second user listen to and converse with each other based on the same keyword, and a data transmission/reception device provided on the first robot, a remote data transmission/reception device provided on the second robot, and a cloud server are connected via the Internet, and each of the data transmission/reception device and the remote data transmission/reception device acquires a data file of the video as a video file, transmits the video file to the cloud server, and receives the keyword from the cloud server. In the data transmission/reception device, a transmission/reception processing unit that transmits the video file to the remote data transmission/reception device, receives from the remote data transmission/reception device a data file of the utterance by the second user as a second utterance file, calculates time information Ta reflecting the degree of congestion of the Internet at the data transmission/reception device from the transmission of the video file to the cloud server to the reception of the keyword, transmits the time information Ta to the remote data transmission/reception device, receives from the remote data transmission/reception device time information Tb reflecting the degree of congestion of the Internet at the remote data transmission/reception device, and when it is determined based on the time information Ta and the time information Tb that the degree of congestion of the Internet at the data transmission/reception device is higher than that of the remote data transmission/reception device, stops the transmission of the video file to the cloud server, and receives from the remote data transmission/reception device the keyword that the remote data transmission/reception device received from the cloud server; an utterance playback unit that plays the second utterance file received by the transmission/reception processing unit; and an utterance playback unit that plays an utterance based on the keyword received by the transmission/reception processing unit.

また、請求項２のデータ送受信装置は、請求項１に記載のデータ送受信装置において、前記送受信処理部が、前記時間情報Ｔｂから前記時間情報Ｔａを減算することで、時間情報Ｔを求め、前記時間情報Ｔの値が０よりも小さい場合、当該データ送受信装置における前記インターネットの混雑度合いが前記遠隔データ送受信装置よりも高いと判断し、前記映像ファイルの前記クラウドサーバへの送信を停止し、前記遠隔データ送受信装置から、前記遠隔データ送受信装置が前記クラウドサーバから受信した前記キーワードを受信する第１の処理を行い、前記時間情報Ｔの値が０よりも大きい場合、当該データ送受信装置における前記インターネットの混雑度合いが前記遠隔データ送受信装置よりも低いと判断し、前記映像ファイルを取得して前記クラウドサーバへ送信し、前記クラウドサーバから前記キーワードを受信する第２の処理を行い、前記時間情報Ｔの値が０である場合、当該データ送受信装置における前記インターネットの混雑度合いが前記遠隔データ送受信装置と同じであると判断し、前記第１の処理または前記第２の処理を行う、ことを特徴とする。 The data transmission/reception device of claim 2 is the data transmission/reception device of claim 1, characterized in that the transmission/reception processing unit subtracts the time information Ta from the time information Tb to obtain time information T, and if the value of the time information T is smaller than 0, determines that the congestion level of the Internet in the data transmission/reception device is higher than that of the remote data transmission/reception device, stops transmitting the video file to the cloud server, and performs a first process of receiving the keyword received by the remote data transmission/reception device from the cloud server from the remote data transmission/reception device, and if the value of the time information T is greater than 0, determines that the congestion level of the Internet in the data transmission/reception device is lower than that of the remote data transmission/reception device, acquires the video file and transmits it to the cloud server, and performs a second process of receiving the keyword from the cloud server, and if the value of the time information T is 0, determines that the congestion level of the Internet in the data transmission/reception device is the same as that of the remote data transmission/reception device, and performs the first process or the second process.

また、請求項３のデータ送受信装置は、請求項２に記載のデータ送受信装置において、前記送受信処理部が、前記時間情報Ｔの値が０よりも小さい場合、当該データ送受信装置における前記インターネットの混雑度合いが前記遠隔データ送受信装置よりも高いと判断し、前記映像ファイルの前記クラウドサーバへの送信を停止し、前記遠隔データ送受信装置から、前記遠隔データ送受信装置が前記クラウドサーバから受信した前記キーワードを受信し、予め設定された時間待機した後に、前記映像ファイルを取得して前記クラウドサーバへ送信し、前記クラウドサーバから前記キーワードを受信する前記第１の処理を行う、ことを特徴とする。 The data transmission/reception device of claim 3 is the data transmission/reception device of claim 2, characterized in that, when the value of the time information T is smaller than 0, the transmission/reception processing unit determines that the degree of congestion of the Internet in the data transmission/reception device is higher than that of the remote data transmission/reception device, stops transmitting the video file to the cloud server, receives from the remote data transmission/reception device the keyword that the remote data transmission/reception device received from the cloud server, waits for a preset time, and then acquires the video file, transmits it to the cloud server, and performs the first process of receiving the keyword from the cloud server.

また、請求項４のデータ送受信装置は、請求項１に記載のデータ送受信装置において、前記送受信処理部が、前記時間情報Ｔｂから前記時間情報Ｔａを減算することで、時間情報Ｔを求め、前記時間情報Ｔの値が０よりも小さい場合、当該データ送受信装置における前記インターネットの混雑度合いが前記遠隔データ送受信装置よりも高いと判断し、前記映像ファイルの前記クラウドサーバへの送信を停止し、前記遠隔データ送受信装置から、前記遠隔データ送受信装置が前記クラウドサーバから受信した前記キーワードを受信し、予め設定された値をＮとして時間Ｎの間待機した後に、前記映像ファイルを取得して前記クラウドサーバへ送信し、前記クラウドサーバから前記キーワードを受信し、前記時間情報Ｔａを算出し、前記映像ファイルを取得したときから前記時間Ｎの間待機した後に、前記遠隔データ送受信装置から前記時間情報Ｔｂを受信し、前記時間情報Ｔを求める第３の処理を行い、前記時間情報Ｔの値が０よりも大きい場合、当該データ送受信装置における前記インターネットの混雑度合いが前記遠隔データ送受信装置よりも低いと判断し、前記映像ファイルを取得して前記クラウドサーバへ送信し、前記クラウドサーバから前記キーワードを受信し、前記時間情報Ｔａを算出し、前記映像ファイルを取得したときから前記時間Ｎの間待機した後に、前記遠隔データ送受信装置から前記時間情報Ｔｂを受信し、前記時間情報Ｔを求める第４の処理を行い、前記時間情報Ｔの値が０である場合、当該データ送受信装置における前記インターネットの混雑度合いが前記遠隔データ送受信装置と同じであると判断し、前記第３の処理または前記第４の処理を行う、ことを特徴とする。 In addition, the data transmission/reception device of claim 4 is the data transmission/reception device of claim 1, in which the transmission/reception processing unit obtains time information T by subtracting the time information Ta from the time information Tb, and if the value of the time information T is smaller than 0, determines that the degree of congestion of the Internet in the data transmission/reception device is higher than that of the remote data transmission/reception device, stops transmitting the video file to the cloud server, receives the keyword from the remote data transmission/reception device, and waits for a time N, where N is a preset value, before acquiring the video file and transmitting it to the cloud server, receives the keyword from the cloud server, calculates the time information Ta, and waits for the time N from the time the video file was acquired, and then transmits the video file to the cloud server. The method is characterized in that it receives the time information Tb from the remote data transmission/reception device, performs a third process to obtain the time information T, and if the value of the time information T is greater than 0, determines that the degree of congestion of the Internet in the data transmission/reception device is lower than that of the remote data transmission/reception device, obtains the video file and transmits it to the cloud server, receives the keyword from the cloud server, calculates the time information Ta, waits for the time N from when the video file was obtained, receives the time information Tb from the remote data transmission/reception device, and performs a fourth process to obtain the time information T, and if the value of the time information T is 0, determines that the degree of congestion of the Internet in the data transmission/reception device is the same as that of the remote data transmission/reception device, and performs the third process or the fourth process.

また、請求項５のデータ送受信装置は、請求項４に記載のデータ送受信装置において、前記送受信処理部が、前記時間情報Ｔの絶対値が予め設定された閾値よりも大きい場合、前記時間Ｎを、予め設定された前記値よりも大きい値に設定し、前記時間情報Ｔの絶対値が前記閾値以下である場合、予め設定された前記値を前記時間Ｎに設定する、ことを特徴とする。 The data transmission/reception device of claim 5 is the data transmission/reception device of claim 4, characterized in that the transmission/reception processing unit sets the time N to a value greater than the preset value when the absolute value of the time information T is greater than a preset threshold value, and sets the time N to the preset value when the absolute value of the time information T is equal to or less than the threshold value.

また、請求項６のデータ送受信装置は、請求項１から５までのいずれか一項に記載のデータ送受信装置において、前記発話文再生部が、前記発話文の再生を開始する際に、前記送受信処理部により前記第１の発声ファイルが取得中である場合、前記第１の発声ファイルの取得が完了するまで待機し、または前記発声再生部により前記第２の発声ファイルが再生中である場合、前記第２の発声ファイルの再生が完了するまで待機し、待機した後に前記発話文を再生する、ことを特徴とする。 The data transmission/reception device of claim 6 is a data transmission/reception device according to any one of claims 1 to 5, characterized in that when the spoken sentence playback unit starts playing back the spoken sentence, if the first voice file is being acquired by the transmission/reception processing unit, the device waits until the acquisition of the first voice file is completed, or if the second voice file is being played back by the voice playback unit, the device waits until the playback of the second voice file is completed, and plays back the spoken sentence after waiting.

さらに、請求項７のプログラムは、コンピュータを、請求項１から６までのいずれか一項に記載のデータ送受信装置として機能させることを特徴とする。 Furthermore, the program of claim 7 is characterized in that it causes a computer to function as a data transmission/reception device according to any one of claims 1 to 6.

以上のように、本発明によれば、クラウドサービスにかかるコストを低減すると共に、ロボットの発話のタイミングの遅れを抑制することができる。 As described above, the present invention can reduce the cost of cloud services and suppress delays in the timing of a robot's speech.

本発明の実施形態によるデータ送受信装置を含むデータ送受信システムを説明する概念図である。1 is a conceptual diagram illustrating a data transmission/reception system including a data transmission/reception device according to an embodiment of the present invention. 本発明の実施形態によるデータ送受信装置の構成例を示すブロック図である。1 is a block diagram showing an example of the configuration of a data transmission/reception device according to an embodiment of the present invention. 送受信処理部の構成例を示すブロック図である。4 is a block diagram showing a configuration example of a transmission/reception processing unit. FIG. 送信制御部の処理例を示すフローチャートである。13 is a flowchart illustrating an example of processing by a transmission control unit. 送受信処理部による映像ファイルの送信、送信停止及び送信再開処理等の例を示すフローチャートである。11 is a flowchart showing an example of a process of transmitting, stopping, and resuming transmission of a video file by a transmission/reception processing unit. 送受信処理部による映像ファイルの送信、送信停止及び送信再開処理等の例のタイミングを説明する図である。11A to 11C are diagrams illustrating an example of timing of transmission, transmission stop, and transmission restart processing of a video file by a transmission/reception processing unit. データ送受信装置による映像ファイルのアップロード送信期間を説明するイメージ図である。FIG. 11 is an image diagram illustrating a period during which a video file is uploaded and transmitted by a data transmission/reception device. 両方のデータ送受信装置のアップロードトラフィックが発生する期間Ｔ１，Ｔ３，Ｔ５のデータフロー（ａ）を説明する図である。FIG. 13 is a diagram illustrating data flows (a) during periods T1, T3, and T5 when upload traffic occurs from both data transmitting and receiving devices. 一方のデータ送受信装置のアップロードトラフィックが発生しない期間Ｔ２，Ｔ４のデータフロー（ｂ）を説明する図である。FIG. 13 is a diagram illustrating the data flow (b) during periods T2 and T4 when no upload traffic occurs in one data transmitting/receiving device. 他方のデータ送受信装置のアップロードトラフィックが発生しない期間Ｔ６のデータフロー（ｃ）を説明する図である。FIG. 11 is a diagram illustrating a data flow (c) during a period T6 in which no upload traffic occurs in the other data transmitting/receiving device.

以下、本発明を実施するための形態について図面を用いて詳細に説明する。
〔データ送受信システム〕
まず、データ送受信システムについて説明する。図１は、本発明の実施形態によるデータ送受信装置を含むデータ送受信システムを説明する概念図である。 Hereinafter, an embodiment of the present invention will be described in detail with reference to the drawings.
[Data transmission and reception system]
First, a data transmission/reception system will be described. Fig. 1 is a conceptual diagram illustrating a data transmission/reception system including a data transmission/reception device according to an embodiment of the present invention.

このデータ送受信システム１は、場所Ａ（第１の場所）に設置されたデータ送受信装置２、場所Ｂ（第２の場所）に設置されたデータ送受信装置（遠隔データ送受信装置）３、及びクラウド上に設置されたクラウドサーバ４を備えて構成される。データ送受信装置２，３及びクラウドサーバ４は、インターネット３０を介して接続される。 This data transmission/reception system 1 is configured with a data transmission/reception device 2 installed at location A (first location), a data transmission/reception device (remote data transmission/reception device) 3 installed at location B (second location), and a cloud server 4 installed on the cloud. The data transmission/reception devices 2 and 3 and the cloud server 4 are connected via the Internet 30.

場所Ａは、データ送受信装置２を備えたロボット（第１のロボット）１１、人（第１のユーザ）１２及びディスプレイ（第１のディスプレイ）１３が存在する環境である。場所Ｂは、データ送受信装置３を備えたロボット（第２のロボット）２１、人（第２のユーザ）２２及びディスプレイ（第２のディスプレイ）２３が存在する環境である。 Location A is an environment in which a robot (first robot) 11 equipped with a data transmission/reception device 2, a person (first user) 12, and a display (first display) 13 exist. Location B is an environment in which a robot (second robot) 21 equipped with a data transmission/reception device 3, a person (second user) 22, and a display (second display) 23 exist.

場所Ａに存在するディスプレイ１３及び場所Ｂに存在するディスプレイ２３には、同一の映像が表示される。映像は、例えばテレビ番組、動画である。人１２，２２は、友人、知人、親子等の会話の対象人物である。 The same image is displayed on the display 13 in place A and the display 23 in place B. The image may be, for example, a television program or a video. People 12 and 22 are conversation subjects, such as friends, acquaintances, or parents and children.

場所Ａに存在する人１２（及びロボット１１）は、ディスプレイ１３に表示された映像を視聴し、場所Ｂに存在する人２２（及びロボット２１）は、ディスプレイ２３に表示された映像を視聴する。ロボット１１は、ディスプレイ１３に表示された映像に関連するキャプション、有名人名及び画像ラベル等のテキスト情報（以下、「キーワード」という。）に基づいた発話を行い、ロボット２１は、ディスプレイ２３に表示された映像に関連するキーワードに基づいた発話を行う。 Person 12 (and robot 11) in location A watches the video displayed on display 13, while person 22 (and robot 21) in location B watches the video displayed on display 23. Robot 11 makes speech based on text information (hereinafter referred to as "keywords") such as captions, famous people's names, and image labels related to the video displayed on display 13, while robot 21 makes speech based on keywords related to the video displayed on display 23.

人１２は、ロボット１１による映像（ディスプレイ１３に表示された映像）に関連するキーワードに基づいた発話を聞きながら、遠隔の場所Ｂに存在する人２２と会話を行う。同様に、人２２は、ロボット２１による映像（ディスプレイ２３に表示された映像）に関連するキーワードに基づいた発話を聞きながら、遠隔の場所Ａに存在する人１２と会話を行う。つまり、人１２，２２は、それぞれロボット１１，２１による同一の映像に関連する同一のキーワードに基づいた発話を聞きながら、会話を行う。 Person 12 converses with person 22 at remote location B while listening to speech based on keywords related to the image by robot 11 (image displayed on display 13). Similarly, person 22 converses with person 12 at remote location A while listening to speech based on keywords related to the image by robot 21 (image displayed on display 23). In other words, people 12 and 22 converse while listening to speech based on the same keywords related to the same image by robots 11 and 21, respectively.

データ送受信装置２は、データ送受信装置３よりもインターネット３０の接続環境が良い場合（混雑度合いが低い場合）、ディスプレイ１３に表示された映像に関する映像ファイルを、インターネット３０を介してクラウドサーバ４へ送信する。そして、データ送受信装置２は、クラウドサーバ４から映像ファイルの映像に関連するキーワードを受信する。データ送受信装置２は、受信したキーワード及び人１２の発声ファイル（第１の発声ファイル）等を、インターネット３０を介してデータ送受信装置３へ送信する。 When the connection environment of the Internet 30 is better (less congested) than that of the data transmission/reception device 3, the data transmission/reception device 2 transmits a video file related to the video displayed on the display 13 to the cloud server 4 via the Internet 30. Then, the data transmission/reception device 2 receives keywords related to the video of the video file from the cloud server 4. The data transmission/reception device 2 transmits the received keywords and the voice file (first voice file) of the person 12, etc., to the data transmission/reception device 3 via the Internet 30.

一方、データ送受信装置２は、データ送受信装置３よりもインターネット３０の接続環境が良くない場合（混雑度合いが高い場合）、データ送受信装置３から、映像ファイルの映像に関連するキーワードを受信する。また、データ送受信装置２は、データ送受信装置３から人２２の発声ファイル（第２の発声ファイル）等を受信する。 On the other hand, when the connection environment of the Internet 30 is not as good as that of the data transmission/reception device 3 (when the degree of congestion is high), the data transmission/reception device 2 receives keywords related to the image of the video file from the data transmission/reception device 3. In addition, the data transmission/reception device 2 receives the voice file (second voice file) of the person 22 from the data transmission/reception device 3.

データ送受信装置２は、クラウドサーバ４またはデータ送受信装置３からキーワードを受信すると、キーワードに基づいた発話文をロボット１１の発話として再生し、データ送受信装置３から人２２の発声ファイルを受信すると、人２２の発声を再生する。 When the data transmission/reception device 2 receives a keyword from the cloud server 4 or the data transmission/reception device 3, it plays back a speech sentence based on the keyword as the speech of the robot 11, and when it receives a speech file of the person 22 from the data transmission/reception device 3, it plays back the speech of the person 22.

これにより、場所Ａに存在する人１２は、ディスプレイ１３に表示された映像に関連するキーワードに基づいた発話をロボット１１の発話として聞くことができ、場所Ｂに存在する人２２の発声も聞くことができる。つまり、人１２は、映像に関連する情報を得ながら、人２２と会話を行うことができる。 As a result, person 12 in place A can hear the robot 11's speech based on keywords related to the video displayed on display 13, and can also hear the speech of person 22 in place B. In other words, person 12 can have a conversation with person 22 while obtaining information related to the video.

データ送受信装置３は、データ送受信装置２と同様の処理を行う。これにより、場所Ｂに存在する人２２は、ディスプレイ２３に表示された映像に関連するキーワードに基づいた発話をロボット２１の発話として聞くことができ、場所Ａに存在する人１２の発声も聞くことができる。つまり、人２２は、映像に関連する情報を得ながら、人１２と会話を行うことができる。データ送受信装置２，３の詳細については後述する。 The data transmission/reception device 3 performs the same process as the data transmission/reception device 2. As a result, the person 22 in place B can hear the speech based on keywords related to the video displayed on the display 23 as the speech of the robot 21, and can also hear the speech of the person 12 in place A. In other words, the person 22 can have a conversation with the person 12 while obtaining information related to the video. The data transmission/reception devices 2 and 3 will be described in detail later.

クラウドサーバ４は、データ送受信装置２から送信された映像ファイルを、インターネット３０を介して受信し、例えばキャプション生成、有名人検索、画像ラベル検出等を並列に処理する。そして、クラウドサーバ４は、処理結果の中から、例えば予め設定されたキーワード辞書に含まれる単語をキーワードとして設定し、映像ファイルの映像に関連するキーワードとして、映像ファイルを送信してきたデータ送受信装置２へ送信する。 The cloud server 4 receives the video file transmitted from the data transmission/reception device 2 via the Internet 30, and performs parallel processing, such as caption generation, celebrity search, and image label detection. The cloud server 4 then sets, from among the processing results, for example, words contained in a preset keyword dictionary as keywords, and transmits these as keywords related to the video in the video file to the data transmission/reception device 2 that transmitted the video file.

クラウドサーバ４は、データ送受信装置３から映像ファイルを受信した場合も同様の処理を行い、映像ファイルの映像に関連するキーワードを、映像ファイルを送信してきたデータ送受信装置３へ送信する。 When the cloud server 4 receives a video file from the data transmission/reception device 3, it performs the same process and transmits keywords related to the video in the video file to the data transmission/reception device 3 that transmitted the video file.

〔データ送受信装置〕
次に、図１に示したデータ送受信装置２，３について詳細に説明する。図２は、本発明の実施形態によるデータ送受信装置２，３の構成例を示すブロック図である。 [Data transmission/reception device]
Next, a detailed description will be given of the data transmission/reception devices 2 and 3 shown in Fig. 1. Fig. 2 is a block diagram showing an example of the configuration of the data transmission/reception devices 2 and 3 according to an embodiment of the present invention.

このデータ送受信装置２は、映像取得部５１、映像ファイル生成部５２、発声取得部５３、発声ファイル生成部５４、遠隔発声再生部（発声再生部）５５、発話文生成部５６、テンプレート保持部５７、発話文再生部５８及び送受信処理部５９を備えている。尚、データ送受信装置３の構成は、図２に示すデータ送受信装置２の構成と同様である。 This data transmission/reception device 2 includes a video acquisition unit 51, a video file generation unit 52, a voice acquisition unit 53, a voice file generation unit 54, a remote voice playback unit (voice playback unit) 55, a spoken sentence generation unit 56, a template storage unit 57, a spoken sentence playback unit 58, and a transmission/reception processing unit 59. The configuration of the data transmission/reception device 3 is the same as the configuration of the data transmission/reception device 2 shown in FIG. 2.

映像取得部５１は、ディスプレイ１３に表示された映像のデータを取得するカメラであり、映像のデータを取得して映像ファイル生成部５２に転送する。映像ファイル生成部５２は、映像取得部５１から転送された映像のデータを入力し、クラウドサーバ４が処理する映像ファイル形式の映像のデータファイルを生成し、これを映像ファイルとして送受信処理部５９に出力する。映像ファイル形式は、例えばＭＰ４である。 The video acquisition unit 51 is a camera that acquires data of the video displayed on the display 13, and transfers the acquired video data to the video file generation unit 52. The video file generation unit 52 inputs the video data transferred from the video acquisition unit 51, generates a video data file in a video file format that is processed by the cloud server 4, and outputs this as a video file to the transmission/reception processing unit 59. The video file format is, for example, MP4.

発声取得部５３は、人１２の発声を取得するマイクであり、発声のデータを取得して発声ファイル生成部５４に転送する。発声ファイル生成部５４は、発声取得部５３から転送された発声のデータを入力し、一般的な音声ファイル形式の発声のデータファイルを生成し、これを発声ファイルとして送受信処理部５９に出力する。音声ファイル形式は、例えばＭＰ３である。 The speech acquisition unit 53 is a microphone that acquires the speech of the person 12, acquires the speech data, and transfers it to the speech file generation unit 54. The speech file generation unit 54 inputs the speech data transferred from the speech acquisition unit 53, generates a speech data file in a general audio file format, and outputs this as a speech file to the transmission/reception processing unit 59. The audio file format is, for example, MP3.

遠隔発声再生部５５は、送受信処理部５９から転送された後述する遠隔発声ファイルを入力し、遠隔発声ファイルを再生する。 The remote speech playback unit 55 inputs the remote speech file (described later) transferred from the transmission/reception processing unit 59 and plays the remote speech file.

遠隔発声ファイルは、場所Ｂに存在する人２２の発声のデータから生成された発声ファイルである。これにより、場所Ａに存在する人１２は、場所Ｂに存在する人２２の発声を聞くことができる。 The remote speech file is a speech file generated from speech data of person 22 present at location B. This allows person 12 present at location A to hear the speech of person 22 present at location B.

尚、遠隔発声再生部５５は、スピーカの機能を有するようにしてもよい。この場合、場所Ｂに存在する人２２の発声が、ロボット１１に備えたデータ送受信装置２の遠隔発声再生部５５からなされることとなる。 The remote speech reproduction unit 55 may have a speaker function. In this case, the speech of the person 22 present in place B is output from the remote speech reproduction unit 55 of the data transmission/reception device 2 provided in the robot 11.

また、遠隔発声再生部５５は、スピーカの機能を有していなくてもよい。この場合のスピーカは、データ送受信装置２とは独立して外部に設置される。遠隔発声再生部５５は、例えばブルートゥース（登録商標）等の通信手段を介して、遠隔発声ファイルを外部のスピーカへ送信し、外部のスピーカは、遠隔発声ファイルを受信して再生を行う。外部のスピーカとしては、例えばスマートフォン、タブレット等のモバイル端末のスピーカ機能が用いられる。モバイル端末は、場所Ｂに存在する人２２の顔を画面表示しながら、人２２の発声の遠隔発声ファイルを受信して再生を行うようにしてもよい。 The remote speech reproduction unit 55 may not have a speaker function. In this case, the speaker is installed externally, independent of the data transmission/reception device 2. The remote speech reproduction unit 55 transmits the remote speech file to an external speaker via a communication means such as Bluetooth (registered trademark), and the external speaker receives and reproduces the remote speech file. As the external speaker, for example, the speaker function of a mobile terminal such as a smartphone or tablet is used. The mobile terminal may receive and reproduce the remote speech file of the speech of person 22 present at location B while displaying the face of person 22 on the screen.

これにより、場所Ｂに存在する人２２の発声が、ロボット１１とは離れた箇所に設置されたスピーカからなされることとなり、人１２は、場所Ｂに存在する人２２の発声と、ロボット１１の発話とを容易に区別することができる。 As a result, the speech of person 22 in location B is made from a speaker installed at a location away from robot 11, and person 12 can easily distinguish between the speech of person 22 in location B and the speech of robot 11.

発話文生成部５６は、送受信処理部５９から転送されたキーワード（映像ファイルの映像に関連するキーワード）を入力すると共に、テンプレート保持部５７からテンプレートを読み出し、キーワード及びテンプレートを組み合わせることで、発話文を生成する。そして、発話文生成部５６は、発話文を発話文再生部５８に出力する。 The utterance sentence generation unit 56 receives the keywords (keywords related to the video of the video file) transferred from the transmission/reception processing unit 59, reads the templates from the template storage unit 57, and generates an utterance sentence by combining the keywords and the templates. The utterance sentence generation unit 56 then outputs the utterance sentence to the utterance sentence playback unit 58.

テンプレート保持部５７は、文章の述語等のテンプレートを保持する。発話文再生部５８は、発話文生成部５６から発話文を入力し、発話文を再生する。これにより、場所Ａに存在する人１２は、場所Ａに存在するロボット１１の発話（ディスプレイ１３に表示された映像に関する情報）を聞くことができる。 The template storage unit 57 stores templates such as predicates of sentences. The speech sentence playback unit 58 inputs the speech sentence from the speech sentence generation unit 56 and plays back the speech sentence. This allows the person 12 present at location A to hear the speech of the robot 11 present at location A (information related to the image displayed on the display 13).

例えば発話文生成部５６が入力したキーワードがドラマの出演者名「〇○××」であり、テンプレートが「ですよ」である場合、発話文生成部５６は、発話文「○○××ですよ」を生成する。テンプレート保持部５７は、例えば「ですよ」、「ですか」、「だね」等のテンプレートを保持する。 For example, if the keyword input to the speech generation unit 56 is the name of a drama actor "XXX" and the template is "Desu yo", the speech generation unit 56 generates the speech "It's XXX desu yo". The template storage unit 57 stores templates such as "Desu yo", "Desu ka", and "Dane".

尚、発話文生成部５６による発話文生成処理としては、例えば以下の非特許文献に記載された技術が用いられる。
［非特許文献］
金子豊、外２名、“テレビ視聴ロボットにおける字幕文内キーワードに基づく発話生成手法”、［online］、映像情報メディア学会、［令和３年１月１１日検索］、インターネット＜https://www.jstage.jst.go.jp/article/iteac/2017/0/2017_32B-4/_pdf＞ The speech sentence generation process by the speech sentence generation unit 56 uses, for example, the technology described in the following non-patent literature.
[Non-Patent Literature]
Yutaka Kaneko and two others, "A method for generating speech based on keywords in subtitles for a television viewing robot," [online], Institute of Image Information and Television Engineers, [searched on January 11, 2021], Internet <https://www.jstage.jst.go.jp/article/iteac/2017/0/2017_32B-4/_pdf>

また、発話文生成部５６は、テンプレート保持部５７に保持されたテンプレートをランダムに選択し、キーワード及びテンプレートを組み合わせることで、発話文を生成するようにしてもよい。 The utterance sentence generation unit 56 may also generate an utterance sentence by randomly selecting a template stored in the template storage unit 57 and combining the keyword and the template.

送受信処理部５９は、映像ファイル生成部５２から映像ファイルを入力すると共に、発声ファイル生成部５４から発声ファイルを入力する。送受信処理部５９は、発声ファイルをデータ送受信装置３へ送信し、データ送受信装置３から遠隔発声ファイルを受信し、遠隔発声ファイルを遠隔発声再生部５５に転送する。 The transmission/reception processing unit 59 inputs the video file from the video file generation unit 52 and the voice file from the voice file generation unit 54. The transmission/reception processing unit 59 transmits the voice file to the data transmission/reception device 3, receives the remote voice file from the data transmission/reception device 3, and transfers the remote voice file to the remote voice playback unit 55.

送受信処理部５９は、映像ファイルをクラウドサーバ４へ送信し、クラウドサーバ４からキーワードを受信することで、ラウンドトリップタイムに相当する時間情報Ｔａを算出する。そして、送受信処理部５９は、時間情報Ｔａ（遠隔時間情報Ｔｂ）をデータ送受信装置３へ送信する。時間情報Ｔａは、データ送受信装置２におけるインターネット３０の混雑度合いを反映した情報である。 The transmission/reception processing unit 59 transmits the video file to the cloud server 4 and receives a keyword from the cloud server 4 to calculate time information Ta equivalent to the round trip time. The transmission/reception processing unit 59 then transmits the time information Ta (remote time information Tb) to the data transmission/reception device 3. The time information Ta is information that reflects the degree of congestion of the Internet 30 in the data transmission/reception device 2.

送受信処理部５９は、データ送受信装置３から、データ送受信装置３におけるラウンドトリップタイムに相当する遠隔時間情報Ｔｂを受信する。そして、送受信処理部５９は、遠隔時間情報Ｔｂから時間情報Ｔａを減算し、減算結果に基づいてデータ送受信装置２，３間のインターネット３０の混雑度合いを判定する。遠隔時間情報Ｔｂは、データ送受信装置３におけるインターネット３０の混雑度合いを反映した情報である。 The transmission/reception processing unit 59 receives remote time information Tb corresponding to the round trip time at the data transmission/reception device 3 from the data transmission/reception device 3. The transmission/reception processing unit 59 then subtracts the time information Ta from the remote time information Tb, and determines the degree of congestion on the Internet 30 between the data transmission/reception devices 2 and 3 based on the subtraction result. The remote time information Tb is information that reflects the degree of congestion on the Internet 30 at the data transmission/reception device 3.

送受信処理部５９は、当該データ送受信装置２におけるインターネット３０の混雑度合いがデータ送受信装置３よりも低いと判定した場合、クラウドサーバ４への映像ファイルの送信処理、及びクラウドサーバ４からのキーワードの受信処理を継続する。送受信処理部５９は、クラウドサーバ４から受信したキーワードを発話文生成部５６に転送し、データ送受信装置３からのリクエストに従い、キーワードをデータ送受信装置３へ送信する。 When the transmission/reception processing unit 59 determines that the congestion level of the Internet 30 in the data transmission/reception device 2 is lower than that of the data transmission/reception device 3, it continues the process of transmitting the video file to the cloud server 4 and the process of receiving the keyword from the cloud server 4. The transmission/reception processing unit 59 transfers the keyword received from the cloud server 4 to the speech sentence generation unit 56, and transmits the keyword to the data transmission/reception device 3 in accordance with the request from the data transmission/reception device 3.

一方、送受信処理部５９は、当該データ送受信装置２におけるインターネット３０の混雑度合いがデータ送受信装置３よりも高いと判定した場合、クラウドサーバ４への映像ファイルの送信処理を停止する。そして、送受信処理部５９は、データ送受信装置３へリクエストを送信することで、データ送受信装置３からキーワードを受信し、キーワードを発話文生成部５６に転送する。 On the other hand, if the transmission/reception processing unit 59 determines that the congestion level of the Internet 30 in the data transmission/reception device 2 is higher than that of the data transmission/reception device 3, it stops the transmission process of the video file to the cloud server 4. Then, the transmission/reception processing unit 59 receives keywords from the data transmission/reception device 3 by sending a request to the data transmission/reception device 3, and transfers the keywords to the speech sentence generation unit 56.

〔送受信処理部５９〕
次に、図２に示したデータ送受信装置２，３に備えた送受信処理部５９について詳細に説明する。図３は、送受信処理部５９の構成例を示すブロック図である。 [Transmission and reception processing unit 59]
Next, a detailed description will be given of the transmission/reception processing unit 59 provided in the data transmission/reception devices 2 and 3 shown in Fig. 2. Fig. 3 is a block diagram showing an example of the configuration of the transmission/reception processing unit 59.

この送受信処理部５９は、映像ファイル取得部６０、キーワード転送部６１、時計６２、時間算出部６３、時間比較部６４、遠隔発声ファイル受信部６５、キーワード受信時刻取得部６６、キーワード受信部６７、キーワード送信部６８、映像ファイル送信時刻取得部６９、送信制御部７０、発声ファイル送信部７１、映像ファイル送信部７２、遠隔時間情報取得部７３及び時間情報送信部７４を備えている。 This transmission/reception processing unit 59 includes a video file acquisition unit 60, a keyword transfer unit 61, a clock 62, a time calculation unit 63, a time comparison unit 64, a remote voice file receiving unit 65, a keyword reception time acquisition unit 66, a keyword receiving unit 67, a keyword transmitting unit 68, a video file transmission time acquisition unit 69, a transmission control unit 70, a voice file transmitting unit 71, a video file transmitting unit 72, a remote time information acquisition unit 73, and a time information transmitting unit 74.

映像ファイル取得部６０は、映像ファイル生成部５２から映像ファイルを取得し、映像ファイルを映像ファイル送信部７２に転送する。これにより、映像ファイルは、映像ファイル送信部７２からクラウドサーバ４へ送信される。 The video file acquisition unit 60 acquires the video file from the video file generation unit 52 and transfers the video file to the video file transmission unit 72. As a result, the video file is transmitted from the video file transmission unit 72 to the cloud server 4.

キーワード転送部６１は、キーワード受信部６７からキーワードを入力し、キーワードを発話文生成部５６に転送する。キーワードは、キーワード受信部６７によりクラウドサーバ４または遠隔の場所Ｂに設置されたデータ送受信装置３から受信される。 The keyword transfer unit 61 receives a keyword from the keyword receiving unit 67 and transfers the keyword to the speech sentence generating unit 56. The keyword is received by the keyword receiving unit 67 from the cloud server 4 or the data transmission/reception device 3 installed at the remote location B.

これにより、キーワードは、ディスプレイ１３に表示されている映像に関連するキーワードとして、発話文生成部５６、テンプレート保持部５７及び発話文再生部５８により、ロボット１１の発話文として再生される。 As a result, the keyword is reproduced as an utterance sentence for the robot 11 by the utterance sentence generation unit 56, the template storage unit 57, and the utterance sentence reproduction unit 58 as a keyword related to the image displayed on the display 13.

時計６２は、当該データ送受信装置２における時刻をカウントする。時間算出部６３は、映像ファイル送信時刻取得部６９から送信開始時刻ｔ１を入力すると共に、キーワード受信時刻取得部６６から受信完了時刻ｔ２を入力する。送信開始時刻ｔ１は、映像ファイル送信部７２が映像ファイルをクラウドサーバ４へ送信した時刻であり、受信完了時刻ｔ２は、キーワード受信部６７がキーワードをクラウドサーバ４から受信した時刻である。 The clock 62 counts the time in the data transmission/reception device 2. The time calculation unit 63 inputs the transmission start time t1 from the video file transmission time acquisition unit 69 and the reception completion time t2 from the keyword reception time acquisition unit 66. The transmission start time t1 is the time when the video file transmission unit 72 transmits the video file to the cloud server 4, and the reception completion time t2 is the time when the keyword reception unit 67 receives the keyword from the cloud server 4.

時間算出部６３は、以下の式にて、受信完了時刻ｔ２から送信開始時刻ｔ１を減算することで、当該データ送受信装置２が映像ファイルを送信してからキーワードを受信するまでの間の時間情報Ｔａ（ラウンドトリップタイムに相当）を求める。そして、時間算出部６３は、時間情報Ｔａを時間比較部６４に出力する。
［数１］
Ｔａ＝ｔ２－ｔ１・・・（１） The time calculation unit 63 subtracts the transmission start time t1 from the reception completion time t2 using the following formula to obtain time information Ta (corresponding to the round trip time) from when the data transmission/reception device 2 transmits the video file to when it receives the keyword. Then, the time calculation unit 63 outputs the time information Ta to the time comparison unit 64.
[Equation 1]
Ta=t2-t1...(1)

時間比較部６４は、時間算出部６３から時間情報Ｔａを入力すると共に、遠隔時間情報取得部７３から遠隔時間情報Ｔｂを入力する。遠隔時間情報Ｔｂは、遠隔の場所Ｂに設置されたデータ送受信装置３において、当該データ送受信装置２と同様の処理により前記式（１）にて算出された時間情報Ｔａであり、遠隔時間情報取得部７３によりデータ送受信装置３から受信される。 The time comparison unit 64 inputs the time information Ta from the time calculation unit 63, and also inputs the remote time information Tb from the remote time information acquisition unit 73. The remote time information Tb is the time information Ta calculated by the data transmission/reception device 3 installed at the remote location B using the same processing as that of the data transmission/reception device 2, using the above formula (1), and is received from the data transmission/reception device 3 by the remote time information acquisition unit 73.

時間比較部６４は、以下の式にて、遠隔時間情報Ｔｂから時間情報Ｔａを減算することで、当該データ送受信装置２におけるインターネット３０の混雑度合いを判定するための時間情報Ｔ（混雑度合い判定用時間情報Ｔ）を求める。
［数２］
Ｔ＝Ｔｂ－Ｔａ・・・（２） The time comparison unit 64 obtains time information T (time information T for congestion degree determination) for determining the congestion degree of the Internet 30 in the data transmission/reception device 2 by subtracting the time information Ta from the remote time information Tb using the following formula.
[Equation 2]
T=Tb-Ta...(2)

時間比較部６４は、混雑度合い判定用時間情報Ｔを送信制御部７０に出力し、時間情報Ｔａを時間情報送信部７４に転送する。時間情報Ｔａは、時間情報送信部７４によりデータ送受信装置３へ送信され、データ送受信装置３は、当該時間情報Ｔａを遠隔時間情報Ｔｂとして受信する。 The time comparison unit 64 outputs the time information T for determining the degree of congestion to the transmission control unit 70 and transfers the time information Ta to the time information transmission unit 74. The time information Ta is transmitted by the time information transmission unit 74 to the data transmission/reception device 3, and the data transmission/reception device 3 receives the time information Ta as remote time information Tb.

遠隔発声ファイル受信部６５は、データ送受信装置３から人２２の発声のデータファイルを遠隔発声ファイルとして受信し、遠隔発声ファイルを遠隔発声再生部５５に転送する。 The remote speech file receiving unit 65 receives the data file of the person 22's speech from the data transmission/reception device 3 as a remote speech file, and transfers the remote speech file to the remote speech playback unit 55.

これにより、遠隔発声ファイルは、遠隔発声再生部５５により、遠隔の場所Ｂに存在する人２２の発声として再生される。 As a result, the remote speech file is played by the remote speech playback unit 55 as the speech of person 22 present at remote location B.

キーワード受信時刻取得部６６は、キーワード受信部６７から、クラウドサーバ４から受信したキーワードの受信タイミングを示す受信完了を入力する。そして、キーワード受信時刻取得部６６は、受信完了を入力したときの時刻を時計６２から取得し、当該時刻を受信完了時刻ｔ２に設定し、受信完了時刻ｔ２を時間算出部６３に出力する。 The keyword reception time acquisition unit 66 inputs reception completion, which indicates the timing of reception of the keyword received from the cloud server 4, from the keyword receiving unit 67. The keyword reception time acquisition unit 66 then acquires the time when reception completion was input from the clock 62, sets this time as reception completion time t2, and outputs reception completion time t2 to the time calculation unit 63.

キーワード受信部６７は、クラウドサーバ４からキーワードを受信した場合、その受信タイミングを示す受信完了をキーワード受信時刻取得部６６に出力すると共に、キーワードをキーワード転送部６１及びキーワード送信部６８に出力する。 When the keyword receiving unit 67 receives a keyword from the cloud server 4, it outputs a reception completion indicating the timing of reception to the keyword reception time acquisition unit 66, and outputs the keyword to the keyword transfer unit 61 and the keyword transmission unit 68.

キーワード受信部６７は、送信制御部７０から送信停止指示または送信再開指示を入力する。送信停止指示及び送信再開指示の詳細については後述する。キーワード受信部６７は、送信停止指示を入力した場合、例えばキーワードの送信のリクエストをデータ送受信装置３へ送信し、データ送受信装置３からキーワードを受信し、キーワードをキーワード転送部６１に出力する。 The keyword receiving unit 67 inputs a transmission stop instruction or a transmission resume instruction from the transmission control unit 70. Details of the transmission stop instruction and the transmission resume instruction will be described later. When the keyword receiving unit 67 inputs a transmission stop instruction, it transmits, for example, a request to transmit a keyword to the data transmission/reception device 3, receives the keyword from the data transmission/reception device 3, and outputs the keyword to the keyword transfer unit 61.

この場合、データ送受信装置３は、データ送受信装置２から当該リクエストを受信すると、当該データ送受信装置３のキーワード送信部６８は、クラウドサーバ４から受信したキーワードをデータ送受信装置２へ送信する。 In this case, when the data transmission/reception device 3 receives the request from the data transmission/reception device 2, the keyword transmission unit 68 of the data transmission/reception device 3 transmits the keyword received from the cloud server 4 to the data transmission/reception device 2.

一方、キーワード受信部６７は、送信制御部７０から送信再開指示を入力した場合、前述のデータ送受信装置３へのリクエストの送信処理を停止する。そして、キーワード受信部６７は、映像ファイル送信部７２により映像ファイルがクラウドサーバ４へ送信された後、クラウドサーバ４からキーワードを受信し、キーワードをキーワード転送部６１及びキーワード送信部６８に出力する。 On the other hand, when the keyword receiving unit 67 receives a transmission restart instruction from the transmission control unit 70, it stops the transmission process of the request to the data transmission/reception device 3 described above. Then, after the video file is transmitted to the cloud server 4 by the video file transmitting unit 72, the keyword receiving unit 67 receives a keyword from the cloud server 4 and outputs the keyword to the keyword transfer unit 61 and the keyword transmitting unit 68.

キーワード送信部６８は、キーワード受信部６７からキーワードを入力し、キーワードをデータ送受信装置３へ送信する。これにより、クラウドサーバ４から受信したキーワードが、データ送受信装置２からデータ送受信装置３へ送信される。 The keyword transmission unit 68 receives the keyword from the keyword reception unit 67 and transmits the keyword to the data transmission/reception device 3. As a result, the keyword received from the cloud server 4 is transmitted from the data transmission/reception device 2 to the data transmission/reception device 3.

映像ファイル送信時刻取得部６９は、映像ファイル送信部７２から、クラウドサーバ４へ映像ファイルを送信した送信タイミングを示す送信開始を入力する。そして、映像ファイル送信時刻取得部６９は、送信開始を入力したときの時刻を時計６２から取得し、当該時刻を送信開始時刻ｔ１に設定し、送信開始時刻ｔ１を時間算出部６３に出力する。 The video file transmission time acquisition unit 69 inputs a transmission start indicating the transmission timing when the video file was transmitted to the cloud server 4 from the video file transmission unit 72. The video file transmission time acquisition unit 69 then acquires the time when the transmission start was input from the clock 62, sets this time as the transmission start time t1, and outputs the transmission start time t1 to the time calculation unit 63.

送信制御部７０は、時間比較部６４から混雑度合い判定用時間情報Ｔを入力し、混雑度合い判定用時間情報Ｔに基づいて、当該データ送受信装置２における混雑度合いを判定する。このデータ送受信装置２における混雑度合いには、当該データ送受信装置２におけるインターネット３０の混雑度合いがデータ送受信装置３よりも高い場合、低い場合、及びデータ送受信装置３と同じ場合の３状態がある。 The transmission control unit 70 inputs the congestion degree determination time information T from the time comparison unit 64, and determines the congestion degree of the data transmission/reception device 2 based on the congestion degree determination time information T. There are three states for the congestion degree of the data transmission/reception device 2: when the congestion degree of the Internet 30 of the data transmission/reception device 2 is higher than that of the data transmission/reception device 3, when it is lower, and when it is the same as that of the data transmission/reception device 3.

送信制御部７０は、混雑度合いが高いと判定した場合、映像ファイルの送信処理を停止することを示す送信停止指示を映像ファイル送信部７２及びキーワード受信部６７に出力する。そして、送信制御部７０は、時計６２から時刻を取得しながら、予め設定された値をＮとして時間Ｎの間待機し、時間Ｎ経過後に、映像ファイルの送信処理を再開することを示す送信再開指示を映像ファイル送信部７２及びキーワード受信部６７に出力する。 When the transmission control unit 70 determines that the congestion level is high, it outputs a transmission stop instruction to the video file transmission unit 72 and the keyword receiving unit 67, indicating that the transmission process of the video file should be stopped. Then, while obtaining the time from the clock 62, the transmission control unit 70 waits for a time N, where N is a preset value, and after the time N has elapsed, it outputs a transmission resume instruction to the video file transmission unit 72 and the keyword receiving unit 67, indicating that the transmission process of the video file should be resumed.

図４は、送信制御部７０の処理例を示すフローチャートである。送信制御部７０は、時間比較部６４から混雑度合い判定用時間情報Ｔを入力したか否かを判定する（ステップＳ４０１）。 Figure 4 is a flowchart showing an example of processing by the transmission control unit 70. The transmission control unit 70 determines whether or not time information T for determining the congestion degree has been input from the time comparison unit 64 (step S401).

送信制御部７０は、ステップＳ４０１において、混雑度合い判定用時間情報Ｔを入力していない場合（ステップＳ４０１：Ｎ）、混雑度合い判定用時間情報Ｔを入力するまで待つ。一方、送信制御部７０は、ステップＳ４０１において、混雑度合い判定用時間情報Ｔを入力した場合（ステップＳ４０１：Ｙ）、ステップＳ４０２へ移行する。 If the transmission control unit 70 has not input the time information T for determining the degree of congestion in step S401 (step S401: N), the transmission control unit 70 waits until the time information T for determining the degree of congestion is input. On the other hand, if the transmission control unit 70 has input the time information T for determining the degree of congestion in step S401 (step S401: Y), the transmission control unit 70 proceeds to step S402.

送信制御部７０は、混雑度合い判定用時間情報Ｔが０よりも小さい（Ｔ＜０、すなわちＴｂ＜Ｔａ）、０よりも大きい（Ｔ＞０、すなわちＴｂ＞Ｔａ）、または０である（Ｔ＝０、すなわちＴｂ＝Ｔａ）を判定する（ステップＳ４０２）。 The transmission control unit 70 determines whether the time information T for determining the congestion degree is smaller than 0 (T<0, i.e., Tb<Ta), greater than 0 (T>0, i.e., Tb>Ta), or 0 (T=0, i.e., Tb=Ta) (step S402).

送信制御部７０は、ステップＳ４０２において、混雑度合い判定用時間情報Ｔが０よりも小さい（Ｔ＜０、すなわちＴｂ＜Ｔａ）と判定した場合、送信停止指示を映像ファイル送信部７２及びキーワード受信部６７に出力する（ステップＳ４０３）。 If the transmission control unit 70 determines in step S402 that the congestion level determination time information T is less than 0 (T<0, i.e., Tb<Ta), it outputs a transmission stop instruction to the video file transmission unit 72 and the keyword reception unit 67 (step S403).

これにより、当該データ送受信装置２におけるインターネット３０の混雑度合いがデータ送受信装置３よりも高いものと判断される。つまり、当該データ送受信装置２のラウンドトリップタイムがデータ送受信装置３のラウンドトリップタイムよりも大きいものと判断される。そして、当該データ送受信装置２からクラウドサーバ４への映像ファイルの送信処理が停止し、データ送受信装置３からクラウドサーバ４への映像ファイルの送信処理が行われる。 As a result, the degree of congestion on the Internet 30 is determined to be higher in the data transmission/reception device 2 than in the data transmission/reception device 3. In other words, the round trip time of the data transmission/reception device 2 is determined to be longer than the round trip time of the data transmission/reception device 3. Then, the process of transmitting the video file from the data transmission/reception device 2 to the cloud server 4 is stopped, and the process of transmitting the video file from the data transmission/reception device 3 to the cloud server 4 is performed.

送信制御部７０は、ステップＳ４０３から移行して、送信停止指示を出力してからの経過時間を時計６２から特定する。そして、送信制御部７０は、時計６２の時刻に基づいて、予め設定された値をＮとして時間Ｎ（キーワード受信部６７が遠隔時間情報Ｔｂの送信のリクエストをデータ送受信装置３へ送信してから時間Ｎ）の間待機する（ステップＳ４０４）。予め設定された時間Ｎは、例えば１０秒、６０秒である。そして、送信制御部７０は、経過時間が時間Ｎに到達したときに、送信再開指示を映像ファイル送信部７２及びキーワード受信部６７に出力する（ステップＳ４０５）。 The transmission control unit 70 proceeds from step S403 and determines from the clock 62 the time that has elapsed since the transmission stop instruction was output. Then, based on the time on the clock 62, the transmission control unit 70 waits for a time N (the time N since the keyword receiving unit 67 sent a request to transmit the remote time information Tb to the data transmission/reception device 3), with N being a preset value (step S404). The preset time N is, for example, 10 seconds or 60 seconds. Then, when the elapsed time reaches time N, the transmission control unit 70 outputs a transmission resume instruction to the video file transmitting unit 72 and the keyword receiving unit 67 (step S405).

これにより、データ送受信装置２，３におけるインターネット３０の混雑度合いの変化に伴い、映像ファイルを送信してキーワードを受信するデータ送受信装置２，３を切り替えることができる。すなわち、混雑度合いの低い方のデータ送受信装置２，３は、時間Ｎ毎に、映像ファイルをクラウドサーバ４へ送信し、クラウドサーバ４からキーワードを受信することとなる。 As a result, the data transmission/reception device 2, 3 that transmits the video file and receives the keyword can be switched according to changes in the congestion level of the Internet 30 at the data transmission/reception devices 2, 3. In other words, the data transmission/reception device 2, 3 that is less congested transmits the video file to the cloud server 4 and receives the keyword from the cloud server 4 every N time periods.

一方、送信制御部７０は、ステップＳ４０２の処理において、混雑度合い判定用時間情報Ｔが０よりも大きい（Ｔ＞０、すなわちＴｂ＞Ｔａ）と判定した場合、映像ファイル送信部７２及びキーワード受信部６７に対する通知を行わない。 On the other hand, if the transmission control unit 70 determines in step S402 that the congestion level determination time information T is greater than 0 (T>0, i.e., Tb>Ta), it does not notify the video file transmission unit 72 and the keyword reception unit 67.

尚、送信制御部７０は、映像ファイルの送信処理を継続することを示す送信継続指示を映像ファイル送信部７２及びキーワード受信部６７に出力するようにしてもよい。 The transmission control unit 70 may also output a continue transmission instruction to the video file transmission unit 72 and the keyword receiving unit 67 to continue the video file transmission process.

これにより、当該データ送受信装置２におけるインターネット３０の混雑度合いがデータ送受信装置３よりも低いものと判断される。つまり、当該データ送受信装置２のラウンドトリップタイムがデータ送受信装置３のラウンドトリップタイムよりも小さいものと判断される。そして、当該データ送受信装置２からクラウドサーバ４への映像ファイルの送信処理が継続することとなる。 As a result, the degree of congestion on the Internet 30 in the data transmission/reception device 2 is determined to be lower than that of the data transmission/reception device 3. In other words, the round trip time of the data transmission/reception device 2 is determined to be shorter than the round trip time of the data transmission/reception device 3. Then, the process of transmitting the video file from the data transmission/reception device 2 to the cloud server 4 continues.

一方、送信制御部７０は、ステップＳ４０２において、混雑度合い判定用時間情報Ｔが０である（Ｔ＝０、すなわちＴｂ＝Ｔａ）と判定した場合、前回と同じ処理を行う（ステップＳ４０６）。 On the other hand, if the transmission control unit 70 determines in step S402 that the congestion level determination time information T is 0 (T=0, i.e., Tb=Ta), it performs the same processing as the previous time (step S406).

つまり、送信制御部７０は、前回のステップＳ４０２の処理において、混雑度合い判定用時間情報Ｔが０よりも小さい（Ｔ＜０、すなわちＴｂ＜Ｔａ）と判定した場合、前述のステップＳ４０３～Ｓ４０５の処理を行う。一方、送信制御部７０は、前回のステップＳ４０２の処理において、混雑度合い判定用時間情報Ｔが０よりも大きい（Ｔ＞０、すなわちＴｂ＞Ｔａ）と判定した場合、映像ファイル送信部７２及びキーワード受信部６７に対する通知を行わない。 In other words, if the transmission control unit 70 determines in the previous processing of step S402 that the time information T for determining the degree of congestion is less than 0 (T<0, i.e., Tb<Ta), it performs the processing of steps S403 to S405 described above. On the other hand, if the transmission control unit 70 determines in the previous processing of step S402 that the time information T for determining the degree of congestion is greater than 0 (T>0, i.e., Tb>Ta), it does not notify the video file transmission unit 72 and the keyword receiving unit 67.

これにより、当該データ送受信装置２におけるインターネット３０の混雑度合いがデータ送受信装置３と同じであるものと判断される。つまり、当該データ送受信装置２のラウンドトリップタイムとデータ送受信装置３のラウンドトリップタイムとが同じであると判断される。そして、前回の処理と同様に、データ送受信装置３からクラウドサーバ４への映像ファイルの送信処理が行われるか、または、当該データ送受信装置２からクラウドサーバ４への映像ファイルの送信処理が行われる。 As a result, it is determined that the congestion level of the Internet 30 in the data transmission/reception device 2 is the same as that of the data transmission/reception device 3. In other words, it is determined that the round trip time of the data transmission/reception device 2 is the same as the round trip time of the data transmission/reception device 3. Then, as in the previous process, the data transmission/reception device 3 transmits the video file to the cloud server 4, or the data transmission/reception device 2 transmits the video file to the cloud server 4.

尚、送信制御部７０は、ステップＳ４０２において、混雑度合い判定用時間情報Ｔが０である（Ｔ＝０、すなわちＴｂ＝Ｔａ）と判定した場合、混雑度合い判定用時間情報Ｔが０よりも小さい（Ｔ＜０、すなわちＴｂ＜Ｔａ）ときの処理、または混雑度合い判定用時間情報Ｔが０よりも大きい（Ｔ＞０、すなわちＴｂ＞Ｔａ）ときの処理のうちのいずれか一方を行うようにすればよい。例えば送信制御部７０は、これらの処理のうち予め設定された処理を行う。 When the transmission control unit 70 determines in step S402 that the congestion level determination time information T is 0 (T=0, i.e., Tb=Ta), it may perform either a process when the congestion level determination time information T is smaller than 0 (T<0, i.e., Tb<Ta) or a process when the congestion level determination time information T is larger than 0 (T>0, i.e., Tb>Ta). For example, the transmission control unit 70 performs a preset process among these processes.

図３に戻って、発声ファイル送信部７１は、発声ファイル生成部５４から発声ファイルを入力し、発声ファイルをデータ送受信装置３へ送信する。 Returning to FIG. 3, the voice file transmission unit 71 inputs the voice file from the voice file generation unit 54 and transmits the voice file to the data transmission/reception device 3.

これにより、人１２の発声ファイルはデータ送受信装置３へ送信され、データ送受信装置３は、人１２の発声ファイルを遠隔発声ファイルとして受信する。 As a result, the voice file of person 12 is transmitted to the data transmission/reception device 3, and the data transmission/reception device 3 receives the voice file of person 12 as a remote voice file.

映像ファイル送信部７２は、映像ファイル取得部６０から映像ファイルを入力すると共に、送信制御部７０から送信停止指示または送信再開指示を入力する。映像ファイル送信部７２は、初期処理の際に、映像ファイルをクラウドサーバ４へ送信する。そして、映像ファイル送信部７２は、初期処理の後の時間Ｎ毎のタイミングで、映像ファイルをクラウドサーバ４へ送信する処理、送信停止指示を入力した場合の処理、または送信再開指示を入力した場合の映像ファイルをクラウドサーバ４へ送信する処理を行う。 The video file transmission unit 72 inputs a video file from the video file acquisition unit 60, and also inputs a transmission stop instruction or a transmission resume instruction from the transmission control unit 70. The video file transmission unit 72 transmits a video file to the cloud server 4 during initial processing. Then, at every time N after the initial processing, the video file transmission unit 72 performs a process of transmitting a video file to the cloud server 4, a process when a transmission stop instruction has been input, or a process of transmitting a video file to the cloud server 4 when a transmission resume instruction has been input.

映像ファイル送信部７２は、初期処理の後の時間Ｎ毎のタイミングで、送信制御部７０から送信停止指示を入力した場合、クラウドサーバ４への映像ファイルの送信処理を停止する。一方、映像ファイル送信部７２は、初期処理の後の時間Ｎ毎のタイミングで、送信制御部７０から送信再開指示を入力した場合、クラウドサーバ４への映像ファイルの送信処理を再開する。映像ファイル送信部７２は、映像ファイルをクラウドサーバ４へ送信したときに、送信開始を映像ファイル送信時刻取得部６９に出力する。 When the video file sending unit 72 receives a transmission stop instruction from the transmission control unit 70 at every N hours after the initial processing, the video file sending unit 72 stops the process of sending the video file to the cloud server 4. On the other hand, when the video file sending unit 72 receives a transmission restart instruction from the transmission control unit 70 at every N hours after the initial processing, the video file sending unit 72 restarts the process of sending the video file to the cloud server 4. When the video file sending unit 72 has sent the video file to the cloud server 4, it outputs a transmission start to the video file sending time acquisition unit 69.

遠隔時間情報取得部７３は、時計６２の時刻に基づいて、後述する図５に示す当該送受信処理部５９による処理のスタート（ステップＳ５０１の処理）から時間Ｎ経過したときに、例えば遠隔時間情報Ｔｂの送信のリクエストをデータ送受信装置３へ送信する。そして、遠隔時間情報取得部７３は、データ送受信装置３から遠隔時間情報Ｔｂを受信し、遠隔時間情報Ｔｂを時間比較部６４に転送する。前述のとおり、遠隔時間情報Ｔｂは、データ送受信装置３において、当該データ送受信装置２と同様の処理により前記式（１）にて算出された時間情報Ｔａである。 The remote time information acquisition unit 73 transmits, for example, a request to transmit remote time information Tb to the data transmission/reception device 3 when a time N has elapsed since the start of processing by the transmission/reception processing unit 59 shown in FIG. 5 (processing of step S501) described later, based on the time of the clock 62. Then, the remote time information acquisition unit 73 receives the remote time information Tb from the data transmission/reception device 3 and transfers the remote time information Tb to the time comparison unit 64. As described above, the remote time information Tb is the time information Ta calculated in the data transmission/reception device 3 by the same processing as that of the data transmission/reception device 2 using the above formula (1).

時間情報送信部７４は、時間比較部６４から時間情報Ｔａを入力し、例えばデータ送受信装置３からの遠隔時間情報Ｔｂの送信のリクエストに従い、時間情報Ｔａをデータ送受信装置３へ送信する。これにより、時間情報Ｔａはデータ送受信装置３へ送信され、データ送受信装置３は、時間情報Ｔａを遠隔時間情報Ｔｂとして受信する。この場合、データ送受信装置３は、後述する図５に示すデータ送受信装置３の送受信処理部５９による処理のスタート（ステップＳ５０１の処理）から時間Ｎ経過したときに、遠隔時間情報Ｔｂの送信のリクエストを当該データ送受信装置２へ送信する。 The time information transmission unit 74 inputs the time information Ta from the time comparison unit 64, and transmits the time information Ta to the data transmission/reception device 3, for example, in response to a request from the data transmission/reception device 3 to transmit remote time information Tb. As a result, the time information Ta is transmitted to the data transmission/reception device 3, and the data transmission/reception device 3 receives the time information Ta as remote time information Tb. In this case, the data transmission/reception device 3 transmits a request to transmit remote time information Tb to the data transmission/reception device 2 when time N has elapsed since the start of processing by the transmission/reception processing unit 59 of the data transmission/reception device 3 shown in FIG. 5 (described later) (processing of step S501).

（送受信処理部５９による映像ファイルの送信、送信停止及び送信再開処理等）
次に、図３に示した送受信処理部５９による映像ファイルの送信、送信停止及び送信再開処理等について説明する。図５は、送受信処理部５９によるこれらの処理の例を説明するフローチャートであり、図６は、そのタイミングを説明する図である。 (Transmission of video files, transmission stop, transmission restart processing, etc. by the transmission/reception processing unit 59)
Next, a description will be given of the transmission, transmission stop, and transmission restart processes of a video file by the transmission/reception processing unit 59 shown in Fig. 3. Fig. 5 is a flow chart for explaining an example of these processes by the transmission/reception processing unit 59, and Fig. 6 is a diagram for explaining the timing thereof.

図６に示すように、送受信処理部５９は、予め設定された時間Ｎを単位として、時間Ｎの区間毎に「映像ファイル送信」、「送信停止」及び「送信再開」のうちのいずれかの処理にて動作する。 As shown in FIG. 6, the transmission/reception processing unit 59 operates in units of a preset time N, performing one of the following processes for each interval of time N: "send video file," "stop transmission," or "resume transmission."

尚、図１に示したデータ送受信システム１において、データ送受信装置２の送受信処理部５９による動作と、データ送受信装置３の送受信処理部５９による動作とは、同期しているものとする。データ送受信装置２及びデータ送受信装置３は、例えばＮＴＰ（Network Time Protocol）を用いてそれぞれの時計６２を合わせる等して同期を実現し、両装置の送受信処理部５９による処理のスタートのタイミングを合わせる。 In the data transmission/reception system 1 shown in FIG. 1, the operation by the transmission/reception processing unit 59 of the data transmission/reception device 2 and the operation by the transmission/reception processing unit 59 of the data transmission/reception device 3 are assumed to be synchronized. The data transmission/reception device 2 and the data transmission/reception device 3 achieve synchronization by synchronizing their respective clocks 62 using, for example, NTP (Network Time Protocol), and synchronize the start timing of processing by the transmission/reception processing units 59 of both devices.

送受信処理部５９は、当該送受信処理部５９の処理をスタートすると、映像ファイル生成部５２から映像ファイルを取得し（ステップＳ５０１）、映像ファイルをクラウドサーバ４へ送信する（ステップＳ５０２）。送受信処理部５９は、映像ファイルを送信したときの時刻を送信開始時刻ｔ１として取得する（ステップＳ５０３）。 When the transmission/reception processing unit 59 starts processing, it acquires a video file from the video file generating unit 52 (step S501) and transmits the video file to the cloud server 4 (step S502). The transmission/reception processing unit 59 acquires the time when the video file is transmitted as the transmission start time t1 (step S503).

送受信処理部５９は、クラウドサーバ４からキーワードを受信し（ステップＳ５０４）、キーワードを受信したときの時刻を受信完了時刻ｔ２として取得する（ステップＳ５０５）。そして、送受信処理部５９は、前記式（１）にて、受信完了時刻ｔ２から送信開始時刻ｔ１を減算することで時間情報Ｔａを求める（ステップＳ５０６）。 The transmission/reception processing unit 59 receives the keyword from the cloud server 4 (step S504), and obtains the time when the keyword is received as the reception completion time t2 (step S505). The transmission/reception processing unit 59 then obtains the time information Ta by subtracting the transmission start time t1 from the reception completion time t2 using the above formula (1) (step S506).

送受信処理部５９は、当該処理のスタート（ステップＳ５０１における映像ファイルを取得して）から時間Ｎの間待機する（ステップＳ５０７）。そして、送受信処理部５９は、時間Ｎが経過したときに、遠隔時間情報Ｔｂの送信のリクエストをデータ送受信装置３へ送信し、遠隔時間情報Ｔｂを受信する（ステップＳ５０８）。 The transmission/reception processing unit 59 waits for a time N from the start of the process (obtaining the video file in step S501) (step S507). Then, when the time N has elapsed, the transmission/reception processing unit 59 transmits a request to transmit the remote time information Tb to the data transmission/reception device 3 and receives the remote time information Tb (step S508).

送受信処理部５９は、遠隔時間情報Ｔｂから時間情報Ｔａを減算することで、混雑度合い判定用時間情報Ｔを求める（ステップＳ５０９）。送受信処理部５９は、混雑度合い判定用時間情報Ｔが０よりも小さい（Ｔ＜０、すなわちＴｂ＜Ｔａ）、０よりも大きい（Ｔ＞０、すなわちＴｂ＞Ｔａ）、または０である（Ｔ＝０、すなわちＴｂ＝Ｔａ）を判定する（ステップＳ５１０）。このステップＳ５１０の処理は、図４のステップＳ４０２に対応している。この場合のステップＳ５０１～Ｓ５１０の処理の区間は、図６の時刻０～時刻Ｎまでの間の区間に相当し、「映像ファイル送信」の動作が行われる。 The transmission/reception processing unit 59 obtains time information T for determining the degree of congestion by subtracting time information Ta from remote time information Tb (step S509). The transmission/reception processing unit 59 determines whether the time information T for determining the degree of congestion is smaller than 0 (T<0, i.e. Tb<Ta), larger than 0 (T>0, i.e. Tb>Ta), or 0 (T=0, i.e. Tb=Ta) (step S510). The processing of this step S510 corresponds to step S402 in FIG. 4. The processing period of steps S501 to S510 in this case corresponds to the period from time 0 to time N in FIG. 6, and the operation of "sending a video file" is performed.

送受信処理部５９は、ステップＳ５１０において、混雑度合い判定用時間情報Ｔが０よりも小さい（Ｔ＜０、すなわちＴｂ＜Ｔａ）と判定した場合、クラウドサーバ４への映像ファイルの送信処理を停止する（ステップＳ５１１）。このステップＳ５１１及び後述するステップＳ５１３の処理は、図４のステップＳ４０３～Ｓ４０５に対応している。 If the transmission/reception processing unit 59 determines in step S510 that the congestion degree determination time information T is less than 0 (T<0, i.e., Tb<Ta), it stops the transmission process of the video file to the cloud server 4 (step S511). The process of step S511 and step S513 described later correspond to steps S403 to S405 in FIG. 4.

送受信処理部５９は、キーワードの送信のリクエストをデータ送受信装置３へ送信し、データ送受信装置３からキーワードを受信する（ステップＳ５１２）。 The transmission/reception processing unit 59 transmits a request to transmit the keyword to the data transmission/reception device 3 and receives the keyword from the data transmission/reception device 3 (step S512).

送受信処理部５９は、ステップＳ５０８における遠隔時間情報Ｔｂの送信のリクエストをデータ送受信装置３へ送信してから時間Ｎの間待機する（ステップＳ５１３）。そして、送受信処理部５９は、時間Ｎが経過したときに、当該送受信処理部５９の処理が完了していない場合（ステップＳ５１５：Ｎ）、スタートへ移行して送信を再開し、ステップＳ５０１～Ｓ５１０の処理を行う。 The transmission/reception processing unit 59 waits for a time N after sending a request to the data transmission/reception device 3 to transmit the remote time information Tb in step S508 (step S513). Then, if the processing of the transmission/reception processing unit 59 has not been completed when the time N has elapsed (step S515: N), the transmission/reception processing unit 59 transitions to START and resumes transmission, and performs the processing of steps S501 to S510.

この場合のステップＳ５１１～Ｓ５１３の処理の区間は、図６の時刻Ｎ～時刻２Ｎまでの間の区間に相当し、「送信停止」の動作が行われる。そして、その後のステップＳ５０１～Ｓ５１０の処理の区間は、図６の時刻２Ｎ～時刻３Ｎまでの間の区間に相当し、「送信再開」の動作が行われる。 In this case, the processing period from steps S511 to S513 corresponds to the period from time N to time 2N in FIG. 6, and a "transmission stop" operation is performed. Then, the processing period from steps S501 to S510 thereafter corresponds to the period from time 2N to time 3N in FIG. 6, and a "transmission resume" operation is performed.

一方、送受信処理部５９は、ステップＳ５１０において、混雑度合い判定用時間情報Ｔが０よりも大きい（Ｔ＞０、すなわちＴｂ＞Ｔａ）と判定し、当該送受信処理部５９の処理が完了していない場合（ステップＳ５１５：Ｎ）、スタートへ移行する。そして、送受信処理部５９は、ステップＳ５０１～Ｓ５１０の処理を行う。この場合のステップＳ５０１～Ｓ５１０の処理の区間は、図６の時刻３Ｎ～時刻４Ｎまでの間の区間に相当し、「映像ファイル送信」の動作が行われる。 On the other hand, in step S510, the transmission/reception processing unit 59 determines that the congestion degree determination time information T is greater than 0 (T>0, i.e., Tb>Ta) and the processing of the transmission/reception processing unit 59 is not completed (step S515: N), the process proceeds to START. The transmission/reception processing unit 59 then performs the processing of steps S501 to S510. In this case, the processing period of steps S501 to S510 corresponds to the period from time 3N to time 4N in FIG. 6, and the operation of "sending a video file" is performed.

また、送受信処理部５９は、ステップＳ５１０において、混雑度合い判定用時間情報Ｔが０である（Ｔ＝０、すなわちＴｂ＝Ｔａ）と判定した場合、前回と同じ処理を行う（ステップＳ５１４）。 In addition, if the transmission/reception processing unit 59 determines in step S510 that the congestion degree determination time information T is 0 (T = 0, i.e. Tb = Ta), it performs the same processing as the previous time (step S514).

つまり、送受信処理部５９は、前回のステップＳ５１０の処理において、混雑度合い判定用時間情報Ｔが０よりも小さい（Ｔ＜０、すなわちＴｂ＜Ｔａ）と判定した場合、前述のステップＳ５１１～Ｓ５１３の処理を行う。一方、送受信処理部５９は、前回のステップＳ５１０の処理において、混雑度合い判定用時間情報Ｔが０よりも大きい（Ｔ＞０、すなわちＴｂ＞Ｔａ）と判定した場合、ステップＳ５１５へ移行する。 In other words, if the transmission/reception processing unit 59 determines in the previous processing of step S510 that the time information T for determining the degree of congestion is less than 0 (T<0, i.e., Tb<Ta), it performs the processing of steps S511 to S513 described above. On the other hand, if the transmission/reception processing unit 59 determines in the previous processing of step S510 that the time information T for determining the degree of congestion is greater than 0 (T>0, i.e., Tb>Ta), it proceeds to step S515.

図６の例では、前回のステップＳ５１０の処理において、混雑度合い判定用時間情報Ｔが０よりも大きい（Ｔ＞０、すなわちＴｂ＞Ｔａ）と判定されている。このため、送受信処理部５９は、ステップＳ５１５へ移行し、当該送受信処理部５９の処理が完了していない場合（ステップＳ５１５：Ｎ）、スタートへ移行して送信を継続し、ステップＳ５０１～Ｓ５１０の処理を行う。この場合のステップＳ５０１～Ｓ５１０の処理の区間は、図６の時刻４Ｎ～時刻５Ｎまでの間の区間に相当し、「映像ファイル送信」の動作が行われる。 In the example of FIG. 6, in the previous processing of step S510, it was determined that the congestion degree determination time information T was greater than 0 (T>0, i.e. Tb>Ta). Therefore, the transmission/reception processing unit 59 proceeds to step S515, and if the processing of the transmission/reception processing unit 59 is not completed (step S515:N), it proceeds to start and continues transmission, and performs the processing of steps S501 to S510. In this case, the processing period of steps S501 to S510 corresponds to the period from time 4N to time 5N in FIG. 6, and the operation of "sending video file" is performed.

このように、図１に示したデータ送受信システム１において、データ送受信装置２，３に備えたそれぞれの送受信処理部５９は、時間Ｎの区間毎に「映像ファイル送信」、「送信停止」及び「送信再開」のうちのいずれかの処理にて独立して動作する。この場合の動作は同期している。 In this way, in the data transmission/reception system 1 shown in FIG. 1, the transmission/reception processing units 59 provided in the data transmission/reception devices 2 and 3 operate independently for each time period N by performing one of the processes of "video file transmission," "stop transmission," and "resume transmission." In this case, the operations are synchronized.

それぞれの送受信処理部５９は、初期処理の時間Ｎの区間において「映像ファイル送信」の処理にて動作する。そして、送受信処理部５９は、前記式（２）にて、自らの装置におけるインターネット３０の混雑度合いを示す混雑度合い判定用時間情報Ｔを算出する。 Each transmission/reception processing unit 59 operates in the "video file transmission" process during the initial processing time N. Then, the transmission/reception processing unit 59 calculates the congestion degree determination time information T, which indicates the congestion degree of the Internet 30 on its own device, using the above formula (2).

送受信処理部５９は、混雑度合い判定用時間情報Ｔに基づき、自らの装置が他の装置よりも混雑度合いが低いと判定した場合、「映像ファイル送信」の処理にて動作を継続する。一方、送受信処理部５９は、自らの装置が他の装置よりも混雑度合いが高いと判定した場合、映像ファイルの送信処理を停止する「送信停止」、及び送信処理を再開する「送信再開」の処理にて動作する。 When the transmission/reception processing unit 59 determines that its own device is less congested than other devices based on the congestion level determination time information T, it continues operation with the "video file transmission" process. On the other hand, when the transmission/reception processing unit 59 determines that its own device is more congested than other devices, it operates with the "transmission stop" process to stop the video file transmission process, and the "transmission resume" process to resume the transmission process.

図７は、データ送受信装置２，３による映像ファイルのアップロード送信期間を説明するイメージ図であり、本発明の実施形態の効果を説明する図でもある。 Figure 7 is an image diagram explaining the upload transmission period of a video file by the data transmission/reception devices 2 and 3, and also explains the effect of an embodiment of the present invention.

前述のとおり、データ送受信装置２，３は、それぞれ映像ファイルをクラウドサーバ４へ送信する役割を担っている。映像ファイルをクラウドサーバ４へ送信する役割を担う期間（映像ファイルの送信期間）には、図７に示すとおり、両装置が同時に当該役割を担う期間Ｔ１，Ｔ３，Ｔ５、及びいずれか一方の装置が当該役割を担う期間Ｔ２，Ｔ４，Ｔ６がある。 As described above, data transmission/reception devices 2 and 3 each play a role in transmitting video files to cloud server 4. As shown in FIG. 7, the period during which they play a role in transmitting video files to cloud server 4 (video file transmission period) includes periods T1, T3, and T5 during which both devices simultaneously play that role, and periods T2, T4, and T6 during which only one of the devices plays that role.

映像ファイルを送信する役割を担うか否かは、データ送受信装置２，３が接続されるインターネット３０の混雑度合いに基づいて判断される。インターネット３０の混雑度合いは、映像ファイルの送信を開始してからキーワードを受信するまでの間の時間を基準として、前記式（２）の混雑度合い判定用時間情報Ｔに基づき判断される（図４のステップＳ４０２、図５のステップＳ５１０）。 Whether or not the data transmission/reception device 2, 3 is responsible for transmitting the video file is determined based on the degree of congestion of the Internet 30 to which the data transmission/reception device 2, 3 is connected. The degree of congestion of the Internet 30 is determined based on the time information T for determining the degree of congestion in the above formula (2) using the time from when the transmission of the video file begins to when the keyword is received as a reference (step S402 in FIG. 4, step S510 in FIG. 5).

一方で、インターネット３０の輻輳状況は時々刻々と変化するため、映像ファイルの送信を停止した後、時間Ｎが経過したときに送信停止が解除され、送信が再開される（図４のステップＳ４０３～Ｓ４０５、図５のステップＳ５１１，Ｓ５１３）。そして、インターネット３０の混雑度合いが判断される。このような動作が繰り返される。 However, since the congestion status of the Internet 30 changes from moment to moment, after the transmission of the video file is stopped, the transmission suspension is lifted and transmission is resumed when time N has elapsed (steps S403 to S405 in FIG. 4, steps S511 and S513 in FIG. 5). Then, the degree of congestion of the Internet 30 is determined. This operation is repeated.

期間Ｔ１，Ｔ３，Ｔ５は、データ送受信装置２，３が映像ファイルを同時に送信する期間である。期間Ｔ２，Ｔ４は、データ送受信装置３のみが映像ファイルを送信する期間であり、データ送受信装置２は映像ファイルを送信しない。また、期間Ｔ６は、データ送受信装置２のみが映像ファイルを送信する期間であり、データ送受信装置３は映像ファイルを送信しない。 Periods T1, T3, and T5 are periods during which data transmission/reception devices 2 and 3 transmit video files simultaneously. Periods T2 and T4 are periods during which only data transmission/reception device 3 transmits video files, and data transmission/reception device 2 does not transmit video files. Furthermore, period T6 is a period during which only data transmission/reception device 2 transmits video files, and data transmission/reception device 3 does not transmit video files.

図７に示した期間Ｔ２，Ｔ４，Ｔ６は、このような動作により、アップロードトラフィックの発生が低減されることを示している。これにより、データ送受信装置２，３の両方から常に映像ファイルを送信する必要はなく、いずれか一方から映像ファイルを送信すれば済み、期間Ｔ２，Ｔ４，Ｔ６において、クラウドサービスにかかるコストを低減することができる。 Time periods T2, T4, and T6 shown in FIG. 7 show that such operations reduce the generation of upload traffic. This eliminates the need to constantly send video files from both data transmission/reception devices 2 and 3, and allows video files to be sent from only one of them, making it possible to reduce costs for cloud services during time periods T2, T4, and T6.

図８は、両方のデータ送受信装置２，３のアップロードトラフィックが発生する期間Ｔ１，Ｔ３，Ｔ５のデータフロー（ａ）を説明する図である。期間Ｔ１，Ｔ３，Ｔ５は、初期処理の期間（両データ送受信装置２，３にて「映像ファイル送信」の期間）、または、データ送受信装置２，３の一方における「送信再開」の期間及び他方における「映像ファイル送信」の期間である。 Figure 8 is a diagram explaining data flow (a) during periods T1, T3, and T5 when upload traffic occurs in both data transmission/reception devices 2 and 3. Periods T1, T3, and T5 are periods of initial processing (periods of "video file transmission" in both data transmission/reception devices 2 and 3), or periods of "transmission resumption" in one of the data transmission/reception devices 2 and 3 and "video file transmission" in the other.

期間Ｔ１，Ｔ３，Ｔ５においては、データ送受信装置２，３は、映像ファイルをクラウドサーバ４へ送信し、クラウドサーバ４からキーワードを受信する。また、データ送受信装置２，３は、対応するデータ送受信装置３，２から、データ送受信装置３，２における時間情報Ｔａを遠隔時間情報Ｔｂとして受信する。さらに、データ送受信装置２，３は、発声ファイルを遠隔発声ファイルとし、対応するデータ送受信装置３，２へ送信する。 During periods T1, T3, and T5, the data transmission/reception devices 2 and 3 transmit video files to the cloud server 4 and receive keywords from the cloud server 4. The data transmission/reception devices 2 and 3 also receive time information Ta in the data transmission/reception devices 3 and 2 from the corresponding data transmission/reception devices 3 and 2 as remote time information Tb. Furthermore, the data transmission/reception devices 2 and 3 transmit the voice file as a remote voice file to the corresponding data transmission/reception devices 3 and 2.

図９は、一方のデータ送受信装置２のアップロードトラフィックが発生しない期間Ｔ２，Ｔ４のデータフロー（ｂ）を説明する図である。期間Ｔ２，Ｔ４は、データ送受信装置２におけるインターネット３０の混雑度合いがデータ送受信装置３よりも高いと判定された場合の期間である。つまり、データ送受信装置２における「送信停止」の期間、データ送受信装置３における「映像ファイル送信」の期間である。 Figure 9 is a diagram explaining the data flow (b) during periods T2 and T4 when no upload traffic occurs in one data transmission/reception device 2. Periods T2 and T4 are periods when the degree of congestion on the Internet 30 in data transmission/reception device 2 is determined to be higher than that in data transmission/reception device 3. In other words, they are periods of "transmission stopped" in data transmission/reception device 2 and periods of "video file transmission" in data transmission/reception device 3.

期間Ｔ２，Ｔ４において、データ送受信装置３は、映像ファイルをクラウドサーバ４へ送信し、クラウドサーバ４からキーワードを受信し、キーワードをデータ送受信装置２へ送信する。この場合、データ送受信装置２は、映像ファイルを送信せず、データ送受信装置３からキーワードを受信する。また、データ送受信装置３は、データ送受信装置２から、データ送受信装置２における時間情報Ｔａを遠隔時間情報Ｔｂとして受信する。さらに、データ送受信装置２，３は、発声ファイルを遠隔発声ファイルとして、対応するデータ送受信装置３，２へ送信する。 During periods T2 and T4, the data transmission/reception device 3 transmits a video file to the cloud server 4, receives a keyword from the cloud server 4, and transmits the keyword to the data transmission/reception device 2. In this case, the data transmission/reception device 2 does not transmit a video file, but receives the keyword from the data transmission/reception device 3. In addition, the data transmission/reception device 3 receives the time information Ta at the data transmission/reception device 2 from the data transmission/reception device 2 as remote time information Tb. Furthermore, the data transmission/reception devices 2, 3 transmit the voice file to the corresponding data transmission/reception device 3, 2 as a remote voice file.

図１０は、他方のデータ送受信装置３のアップロードトラフィックが発生しない期間Ｔ６のデータフロー（ｃ）を説明する図である。期間Ｔ６は、データ送受信装置２におけるインターネット３０の混雑度合いがデータ送受信装置３よりも低いと判定された場合の期間である。つまり、データ送受信装置２における「映像ファイル送信」の期間、データ送受信装置３における「送信停止」の期間である。 Figure 10 is a diagram explaining the data flow (c) during period T6 when no upload traffic occurs in the other data transmission/reception device 3. Period T6 is a period when the degree of congestion of the Internet 30 in the data transmission/reception device 2 is determined to be lower than that in the data transmission/reception device 3. In other words, it is a period of "video file transmission" in the data transmission/reception device 2 and a period of "transmission stopped" in the data transmission/reception device 3.

期間Ｔ６において、データ送受信装置２は、映像ファイルをクラウドサーバ４へ送信し、クラウドサーバ４からキーワードを受信し、キーワードをデータ送受信装置３へ送信する。この場合、データ送受信装置３は、映像ファイルを送信せず、データ送受信装置２からキーワードを受信する。また、データ送受信装置２は、データ送受信装置３から、データ送受信装置３における時間情報Ｔａを遠隔時間情報Ｔｂとして受信する。さらに、データ送受信装置２，３は、発声ファイルを遠隔発声ファイルとして、対応するデータ送受信装置３，２へ送信する。 During period T6, the data transmission/reception device 2 transmits a video file to the cloud server 4, receives a keyword from the cloud server 4, and transmits the keyword to the data transmission/reception device 3. In this case, the data transmission/reception device 3 does not transmit a video file, but receives the keyword from the data transmission/reception device 2. In addition, the data transmission/reception device 2 receives the time information Ta at the data transmission/reception device 3 from the data transmission/reception device 3 as remote time information Tb. Furthermore, the data transmission/reception devices 2, 3 transmit the voice file to the corresponding data transmission/reception device 3, 2 as a remote voice file.

以上のように、本発明の実施形態のデータ送受信装置２によれば、映像取得部５１は、ディスプレイ１３に表示された映像のデータを取得し、映像ファイル生成部５２は、所定の映像ファイル形式の映像ファイルを生成する。発声取得部５３は、人１２の発声のデータを取得し、発声ファイル生成部５４は、所定の音声ファイル形式の発声ファイルを生成する。 As described above, according to the data transmission/reception device 2 of the embodiment of the present invention, the video acquisition unit 51 acquires data of the video displayed on the display 13, and the video file generation unit 52 generates a video file in a predetermined video file format. The speech acquisition unit 53 acquires data of the person 12's speech, and the speech file generation unit 54 generates a speech file in a predetermined audio file format.

送受信処理部５９は、映像ファイルをクラウドサーバ４へ送信し、クラウドサーバ４からキーワードを受信することで、当該データ送受信装置２におけるラウンドトリップタイムに相当する時間情報Ｔａを算出する。また、送受信処理部５９は、発声ファイルをデータ送受信装置３へ送信し、データ送受信装置３から遠隔発声ファイルを受信する。 The transmission/reception processing unit 59 transmits the video file to the cloud server 4 and receives a keyword from the cloud server 4, thereby calculating time information Ta corresponding to the round trip time in the data transmission/reception device 2. The transmission/reception processing unit 59 also transmits a voice file to the data transmission/reception device 3 and receives a remote voice file from the data transmission/reception device 3.

送受信処理部５９は、当該送受信処理部５９の処理のスタート時から時間Ｎだけ待機した後、データ送受信装置３から、データ送受信装置３におけるラウンドトリップタイムに相当する遠隔時間情報Ｔｂを受信する。そして、送受信処理部５９は、遠隔時間情報Ｔｂから時間情報Ｔａを減算することで、混雑度合い判定用時間情報Ｔを算出する。 The transmission/reception processing unit 59 waits for a time N from the start of processing of the transmission/reception processing unit 59, and then receives from the data transmission/reception device 3 remote time information Tb corresponding to the round trip time in the data transmission/reception device 3. The transmission/reception processing unit 59 then calculates the time information T for determining the congestion degree by subtracting the time information Ta from the remote time information Tb.

送受信処理部５９は、混雑度合い判定用時間情報Ｔ＜０の場合、クラウドサーバ４への映像ファイルの送信処理を停止し、データ送受信装置３からキーワードを受信し、時間Ｎの間待機した後、映像ファイルの送信処理を再開する。一方、送受信処理部５９は、混雑度合い判定用時間情報Ｔ＞０の場合、クラウドサーバ４への映像ファイルの送信処理を継続する。また、送受信処理部５９は、混雑度合い判定用時間情報Ｔ＝０の場合、前回と同じ処理を行う。 When the time information for determining the degree of congestion T<0, the transmission/reception processing unit 59 stops the process of transmitting the video file to the cloud server 4, receives a keyword from the data transmission/reception device 3, waits for a period of time N, and then resumes the process of transmitting the video file. On the other hand, when the time information for determining the degree of congestion T>0, the transmission/reception processing unit 59 continues the process of transmitting the video file to the cloud server 4. Furthermore, when the time information for determining the degree of congestion T=0, the transmission/reception processing unit 59 performs the same process as the previous time.

遠隔発声再生部５５は、送受信処理部５９が受信した遠隔発声ファイルを再生する。これにより、場所Ａに存在する人１２は、遠隔の場所Ｂに存在する人２２の発声を聞くことができる。発話文生成部５６は、送受信処理部５９が受信したキーワードに基づいた発話文を生成し、発話文再生部５８は、発話文を再生する。これにより、場所Ａに存在する人１２は、キーワードに基づいた発話文を聞くことができ、視聴している映像に関連する情報を得ることができる。 The remote speech playback unit 55 plays the remote speech file received by the transmission/reception processing unit 59. This allows the person 12 in place A to hear the speech of the person 22 in the remote place B. The speech sentence generation unit 56 generates a speech sentence based on the keyword received by the transmission/reception processing unit 59, and the speech sentence playback unit 58 plays the speech sentence. This allows the person 12 in place A to hear the speech sentence based on the keyword and obtain information related to the video they are watching.

このように、インターネット３０の混雑度合いの低いデータ送受信装置２，３のいずれか一方から映像ファイルが送信され、混雑度合いの高い他方からは映像ファイルが送信されない。このため、データ送受信装置２，３の両方から常に映像ファイルを送信する必要はなく、図７に示した期間Ｔ２，Ｔ４，Ｔ６のように、いずれか一方のみから映像ファイルを送信する期間が存在することとなる。つまり、映像ファイルをクラウドサーバ４へ送信するアップロードトラフィックの発生を低減することができ、クラウドサービスにかかるコストを低減することができる。 In this way, a video file is transmitted from one of the data transmission/reception devices 2, 3, which has a lower degree of congestion on the Internet 30, and a video file is not transmitted from the other, which has a higher degree of congestion. For this reason, it is not necessary to transmit video files from both data transmission/reception devices 2, 3 at all times, and there are periods when a video file is transmitted from only one of them, such as periods T2, T4, and T6 shown in Figure 7. In other words, it is possible to reduce the generation of upload traffic when transmitting video files to the cloud server 4, and it is possible to reduce the costs associated with cloud services.

また、本発明の実施形態では、インターネット３０の混雑度合いの低いデータ送受信装置２，３のいずれか一方が映像ファイルをクラウドサーバ４へ送信し、クラウドサーバ４からキーワードを受信して他方へ転送するようにした。このキーワードは、映像ファイルのように大容量ではない。このため、データ送受信装置２，３は、混雑度合いの低いインターネット３０の環境の下でキーワードを迅速に取得することができ、キーワードの受信遅れに伴うロボット１１，２１による発話のタイミングの遅れを抑制することができる。つまり、ロボット１１，２１は、ディスプレイ１３，２３に表示される同一の映像に対してさほど遅れることなく迅速なタイミングで、当該ロボット１１，２１間でほぼ同時に、映像に関連する同じキーワードに基づいた発話を行うことができる。 In addition, in an embodiment of the present invention, one of the data transmission/reception devices 2, 3 on the less congested Internet 30 transmits a video file to the cloud server 4, receives a keyword from the cloud server 4, and transfers it to the other. This keyword is not large in volume like a video file. Therefore, the data transmission/reception devices 2, 3 can quickly acquire the keyword in a less congested Internet 30 environment, and can suppress delays in the timing of the speech by the robots 11, 21 that are associated with delays in receiving the keyword. In other words, the robots 11, 21 can make speech based on the same keyword related to the video almost simultaneously between the robots 11, 21 with rapid timing and without much delay in response to the same video displayed on the displays 13, 23.

このため、人１２，２２の会話とロボット１１，２１の発話とは、大きくずれることがなく、ロボット１１，２１の発話から受ける違和感を緩和することができ、同一の映像を視聴している人１２，２２間で会話の活性化を図ることができる。 As a result, there is no significant discrepancy between the conversation between the people 12 and 22 and the speech of the robots 11 and 21, and the sense of discomfort felt from the speech of the robots 11 and 21 can be alleviated, making it possible to stimulate conversation between the people 12 and 22 who are watching the same video.

以上、実施形態を挙げて本発明を説明したが、本発明は前記実施形態に限定されるものではなく、その技術思想を逸脱しない範囲で種々変形可能である。 The present invention has been described above using embodiments, but the present invention is not limited to the above embodiments and can be modified in various ways without departing from the technical concept thereof.

（時間情報Ｔａ及び遠隔時間情報Ｔｂに基づく混雑度合いの判定）
例えば前記実施形態において、データ送受信装置２の送受信処理部５９の送信制御部７０は、混雑度合い判定用時間情報Ｔに基づいて、当該データ送受信装置２における混雑度合いを判定するようにした。これに対し、送信制御部７０は、時間情報Ｔａ及び遠隔時間情報Ｔｂに基づいて、当該データ送受信装置２における混雑度合いを判定するようにしてもよい。 (Determination of the degree of congestion based on time information Ta and remote time information Tb)
For example, in the above embodiment, the transmission control unit 70 of the transmission/reception processing unit 59 of the data transmission/reception device 2 determines the congestion degree of the data transmission/reception device 2 based on the congestion degree determination time information T. In contrast to this, the transmission control unit 70 may determine the congestion degree of the data transmission/reception device 2 based on the time information Ta and the remote time information Tb.

（場所Ａにおける人１２，２２の発声及びロボット１１の発話の重なりの回避）
また、前記実施形態において、データ送受信装置２の発声取得部５３は、場所Ａに存在する人１２の発声のデータを取得するようにした。また、遠隔発声再生部５５は、遠隔の場所Ｂに存在する人２２の発声の遠隔発声ファイルを再生し、発話文再生部５８は、映像に関連するキーワードに基づいた発話文を再生するようにした。 (Avoiding overlapping of speech between people 12, 22 and robot 11 in place A)
In the embodiment, the speech acquisition unit 53 of the data transmission/reception device 2 acquires data of speech from the person 12 present at the location A. The remote speech playback unit 55 plays a remote speech file of the speech from the person 22 present at the remote location B, and the speech sentence playback unit 58 plays a speech sentence based on a keyword related to the video.

ここで、人１２の発声とキーワードに基づいた発話文の再生とが同時に行われる場合があり、また、人２２の発声の遠隔発声ファイルの再生とキーワードに基づいた発話文の再生とが同時に行われる場合もある。これでは、人１２は、当該人１２の発声とロボット１１の発話とが重なってしまい、ロボット１１の発話を聞くことができない可能性があり、また、人２２の発声とロボット１１の発話とが重なってしまい、これらを聞き分けることができない可能性がある。 Here, the speech of person 12 and the playback of the spoken sentence based on the keyword may occur simultaneously, and the playback of a remote speech file of the speech of person 22 and the playback of the spoken sentence based on the keyword may occur simultaneously. In this case, the speech of person 12 may overlap with the speech of robot 11, making it possible that person 12 may not be able to hear the speech of robot 11, and the speech of person 22 may overlap with the speech of robot 11, making it possible that person 12 may not be able to distinguish between them.

この問題を解決するために、発声取得部５３は、人１２の発声のデータを取得しているときに、取得中であることを示す信号（取得中信号）を発話文再生部５８に出力する。また、遠隔発声再生部５５は、人２２の発声の遠隔発声ファイルを再生しているときに、再生中であることを示す信号（再生中信号）を発話文再生部５８に出力する。 To solve this problem, when the speech acquisition unit 53 is acquiring data of the speech of person 12, it outputs a signal indicating that acquisition is in progress (acquiring signal) to the spoken sentence playback unit 58. Also, when the remote speech playback unit 55 is playing back the remote speech file of the speech of person 22, it outputs a signal indicating that playback is in progress (playing signal) to the spoken sentence playback unit 58.

発話文再生部５８は、発話文の再生を開始する際に、発声取得部５３から取得中信号を入力している場合、取得中信号を入力しなくなるまで待機する。また、発話文再生部５８は、発話文の再生を開始する際に、遠隔発声再生部５５から再生中信号を入力している場合、再生中信号を入力しなくなるまで待機する。そして、発話文再生部５８は、待機後に取得中信号及び再生中信号を入力していない場合、発話文の再生を開始する。 When the spoken sentence playback unit 58 starts playing back the spoken sentence, if it has received an acquisition signal from the speech acquisition unit 53, it waits until the acquisition signal is no longer being input. Also, when the spoken sentence playback unit 58 starts playing back the spoken sentence, if it has received a playback signal from the remote speech playback unit 55, it waits until the playback signal is no longer being input. Then, if the acquisition signal and playback signal are not received after waiting, the spoken sentence playback unit 58 starts playing back the spoken sentence.

これにより、発話文再生部５８による発話文の再生は、発声取得部５３による人１２の発声のデータの取得が完了するまで待った後に行われ、また、遠隔発声再生部５５による人２２の発声の遠隔発声ファイルの再生が完了するまで待った後に行われる。 As a result, the spoken sentence is played back by the spoken sentence playback unit 58 after waiting until the speech acquisition unit 53 has completed acquiring the data of the speech of person 12, and also after the remote speech playback unit 55 has completed playing back the remote speech file of the speech of person 22.

つまり、人１２，２２の発声及びロボット１１の発話の時間的な重なりを回避することができる。したがって、人１２は、ロボット１１の発話及び人２２の発声を確実に聞くことができ、映像を視聴している人１２，２２間の会話の活性化を一層図ることができる。 In other words, it is possible to avoid a time overlap between the speech of the people 12, 22 and the speech of the robot 11. Therefore, the person 12 can reliably hear the speech of the robot 11 and the speech of the person 22, which can further stimulate conversation between the people 12, 22 who are watching the video.

（時間Ｎの変更）
また、前記実施形態において、データ送受信装置２の送受信処理部５９の送信制御部７０は、予め設定された時間Ｎを単位として、時間Ｎの区間毎に「映像ファイル送信」、「送信停止」及び「送信再開」のうちのいずれかの処理にて動作するようにした。この場合の時間Ｎは固定であるが、可変とするようにしてもよい。 (Change in Time N)
In the above embodiment, the transmission control unit 70 of the transmission/reception processing unit 59 of the data transmission/reception device 2 operates in one of the processes of "video file transmission", "stop transmission" and "resume transmission" for each section of a preset time N. In this case, the time N is fixed, but may be variable.

例えば送信制御部７０は、混雑度合い判定用時間情報Ｔの絶対値が予め設定された閾値よりも大きい場合、予め設定された時間Ｎよりも大きい値を新たな時間Ｎに設定する。その後、送信制御部７０は、混雑度合い判定用時間情報Ｔの絶対値が前記閾値以下に変化した場合、元の時間Ｎ（予め設定された時間Ｎ）に戻す。 For example, if the absolute value of the time information T for determining the degree of congestion is greater than a preset threshold, the transmission control unit 70 sets a new time N to a value greater than the preset time N. If the absolute value of the time information T for determining the degree of congestion then falls below the threshold, the transmission control unit 70 returns the new time N to the original time N (the preset time N).

また、送信制御部７０は、混雑度合い判定用時間情報Ｔの絶対値と複数の異なる閾値とを用いた閾値処理にて、予め設定された時間Ｎよりも大きい値を新たな時間Ｎに設定することで、新たな時間Ｎを閾値に応じて段階的に変化させるようにしてもよい。また、送信制御部７０は、新たな時間Ｎが混雑度合い判定用時間情報Ｔの絶対値に比例するように、予め設定された時間Ｎと所定の最大値との間で、新たなＮを変化させるようにしてもよい。 The transmission control unit 70 may also perform threshold processing using the absolute value of the congestion degree determination time information T and multiple different thresholds to set the new time N to a value greater than the preset time N, thereby gradually changing the new time N according to the threshold. The transmission control unit 70 may also change the new N between the preset time N and a predetermined maximum value so that the new time N is proportional to the absolute value of the congestion degree determination time information T.

これにより、例えば場所Ａに設置されたデータ送受信装置２に接続されるインターネット３０の回線と、場所Ｂに設置されたデータ送受信装置３に接続されるインターネット３０の回線とが異なり、回線速度の差が明らかである場合（例えば一方が光回線、他方がＡＤＳＬの場合）、インターネット３０の混雑度合いを判定する頻度を低くすることとができる。つまり、回線速度の差が明らかな場合は、映像ファイルを送信してキーワードを受信し、混雑度合い判定用時間情報Ｔを算出する等の「映像ファイル送信」、「送信停止」及び「送信再開」の処理の頻度を低くし、データ送受信装置２，３の処理負荷を低減することができる。 As a result, for example, when the line of the Internet 30 connected to the data transmission/reception device 2 installed at location A is different from the line of the Internet 30 connected to the data transmission/reception device 3 installed at location B, and the difference in line speed is clear (for example, when one is an optical line and the other is ADSL), it is possible to reduce the frequency of determining the congestion level of the Internet 30. In other words, when the difference in line speed is clear, the frequency of "video file transmission", "stop transmission", and "resume transmission" processes, such as transmitting a video file, receiving a keyword, and calculating time information T for determining the congestion level, can be reduced, thereby reducing the processing load of the data transmission/reception devices 2 and 3.

尚、データ送受信装置２，３は、設定した新たな時間Ｎをメモリに記憶するようにしてもよい。この場合、データ送受信装置２，３によるデータ送受信システム１の処理が再開したときに、データ送受信装置２，３は、メモリから新たな時間Ｎを読み出し、新たな時間Ｎを用いて、「映像ファイル送信」の処理を開始する。 The data transmission/reception devices 2 and 3 may store the new time N that has been set in memory. In this case, when the processing of the data transmission/reception system 1 by the data transmission/reception devices 2 and 3 is resumed, the data transmission/reception devices 2 and 3 read the new time N from the memory and start the processing of "video file transmission" using the new time N.

ここで、データ送受信装置２，３に接続されたインターネット３０の回線が光回線、ＡＤＳＬ等のように固定である場合には、それぞれのインターネット３０の環境の優劣は明確であり、混雑度合いはさほど変化しない。このため、データ送受信装置２，３は、インターネット３０の環境の優劣を反映した新たな時間Ｎをメモリに記憶しておくことで、当該新たな時間Ｎを、処理の開始から用いることができ、効率的な処理を実現することができる。 Here, if the Internet 30 lines connected to the data transmission/reception devices 2 and 3 are fixed, such as optical fiber lines or ADSL, the relative merits of each Internet 30 environment are clear, and the degree of congestion does not change significantly. For this reason, the data transmission/reception devices 2 and 3 can store in memory a new time N that reflects the relative merits of the Internet 30 environment, and can use the new time N from the start of processing, thereby achieving efficient processing.

（キーワードに基づいた発話文の送受信）
また、前記実施形態において、データ送受信装置２の送受信処理部５９は、クラウドサーバ４からキーワードを受信した場合、受信したキーワードをデータ送受信装置３へ送信するようにした。これに対し、送受信処理部５９は、キーワードの代わりに、発話文生成部５６により生成された発話文をデータ送受信装置３へ送信するようにしてもよい。 (Transmitting and receiving spoken sentences based on keywords)
In the above embodiment, when the transmission/reception processing unit 59 of the data transmission/reception device 2 receives a keyword from the cloud server 4, it transmits the received keyword to the data transmission/reception device 3. However, the transmission/reception processing unit 59 may transmit an utterance generated by the utterance sentence generating unit 56 to the data transmission/reception device 3 instead of a keyword.

具体的には、データ送受信装置２の送受信処理部５９のキーワード受信部６７は、クラウドサーバ４からキーワードを受信した場合、キーワードを、キーワード転送部６１を介して発話文生成部５６に出力する。発話文生成部５６は、キーワード受信部６７からキーワード転送部６１を介してキーワードを入力し、キーワードに基づいて発話文を生成し、キーワードに基づいた発話文を、送受信処理部５９に備えた発話文送信部（図３には図示せず）に出力する。発話文送信部は、発話文生成部５６からキーワードに基づいた発話文を入力してデータ送受信装置３へ送信する。 Specifically, when the keyword receiving unit 67 of the transmission/reception processing unit 59 of the data transmission/reception device 2 receives a keyword from the cloud server 4, it outputs the keyword to the utterance sentence generation unit 56 via the keyword transfer unit 61. The utterance sentence generation unit 56 inputs the keyword from the keyword receiving unit 67 via the keyword transfer unit 61, generates an utterance sentence based on the keyword, and outputs the utterance sentence based on the keyword to an utterance sentence transmission unit (not shown in FIG. 3) provided in the transmission/reception processing unit 59. The utterance sentence transmission unit inputs the utterance sentence based on the keyword from the utterance sentence generation unit 56 and transmits it to the data transmission/reception device 3.

また、データ送受信装置３がクラウドサーバ４から受信したキーワードに基づいて発話文を生成した場合には、データ送受信装置２の送受信処理部５９の発話文受信部（図３には図示せず）は、データ送受信装置３からキーワードに基づいた発話文を受信する。そして、発話文受信部は、キーワードに基づいた発話文を発話文再生部５８に出力する。発話文再生部５８は、発話文受信部から入力したキーワードに基づいた発話文を再生する。 In addition, when the data transmission/reception device 3 generates an utterance sentence based on a keyword received from the cloud server 4, the utterance sentence receiving unit (not shown in FIG. 3) of the transmission/reception processing unit 59 of the data transmission/reception device 2 receives the utterance sentence based on the keyword from the data transmission/reception device 3. The utterance sentence receiving unit then outputs the utterance sentence based on the keyword to the utterance sentence replay unit 58. The utterance sentence replay unit 58 replays the utterance sentence based on the keyword input from the utterance sentence receiving unit.

これにより、キーワードと同様に、発話文自体のデータ量は少ないため、回線が混雑することはない。また、データ送受信装置２は、データ送受信装置３からキーワードに基づいた発話文を受信した場合には、これをそのまま再生すればよく、キーワードから発話文を生成する必要がないため、処理負荷を低減することができる。尚、データ送受信装置３についてもデータ送受信装置２と同様の処理が行われる。 As a result, as with keywords, the amount of data in the spoken sentence itself is small, so the lines do not become congested. Furthermore, when the data transmission/reception device 2 receives a spoken sentence based on a keyword from the data transmission/reception device 3, it only needs to play it back as is, and there is no need to generate a spoken sentence from a keyword, which reduces the processing load. Note that the same processing as that of the data transmission/reception device 2 is also performed for the data transmission/reception device 3.

尚、本発明の実施形態によるデータ送受信装置２，３のハードウェア構成としては、通常のコンピュータを使用することができる。データ送受信装置２，３は、ＣＰＵ、ＲＡＭ等の揮発性の記憶媒体、ＲＯＭ等の不揮発性の記憶媒体、及びインターフェース等を備えたコンピュータによって構成される。 In addition, a normal computer can be used as the hardware configuration of the data transmission/reception devices 2 and 3 according to the embodiment of the present invention. The data transmission/reception devices 2 and 3 are configured by a computer equipped with a CPU, a volatile storage medium such as RAM, a non-volatile storage medium such as ROM, an interface, etc.

データ送受信装置２，３に備えた映像取得部５１、映像ファイル生成部５２、発声取得部５３、発声ファイル生成部５４、遠隔発声再生部５５、発話文生成部５６、テンプレート保持部５７、発話文再生部５８及び送受信処理部５９の各機能は、これらの機能を記述したプログラムをＣＰＵに実行させることによりそれぞれ実現される。 The functions of the video acquisition unit 51, video file generation unit 52, voice acquisition unit 53, voice file generation unit 54, remote voice playback unit 55, spoken sentence generation unit 56, template storage unit 57, spoken sentence playback unit 58 and transmission/reception processing unit 59 provided in the data transmission/reception devices 2 and 3 are each realized by having the CPU execute a program that describes these functions.

これらのプログラムは、前記記憶媒体に格納されており、ＣＰＵに読み出されて実行される。また、これらのプログラムは、磁気ディスク（フロッピー（登録商標）ディスク、ハードディスク等）、光ディスク（ＣＤ－ＲＯＭ、ＤＶＤ等）、半導体メモリ等の記憶媒体に格納して頒布することもでき、ネットワークを介して送受信することもできる。 These programs are stored in the storage medium and are read and executed by the CPU. In addition, these programs can be distributed by storing them on storage media such as magnetic disks (floppy disks, hard disks, etc.), optical disks (CD-ROMs, DVDs, etc.), and semiconductor memories, and can also be transmitted and received via a network.

１データ送受信システム
２データ送受信装置
３データ送受信装置（遠隔データ送受信装置）
４クラウドサーバ
１１ロボット（第１のロボット）
１２人（第１のユーザ）
１３ディスプレイ（第１のディスプレイ）
２１ロボット（第２のロボット）
２２人（第２のユーザ）
２３ディスプレイ（第２のディスプレイ）
３０インターネット
５１映像取得部
５２映像ファイル生成部
５３発声取得部
５４発声ファイル生成部
５５遠隔発声再生部（発声再生部）
５６発話文生成部
５７テンプレート保持部
５８発話文再生部
５９送受信処理部
６０映像ファイル取得部
６１キーワード転送部
６２時計
６３時間算出部
６４時間比較部
６５遠隔発声ファイル受信部
６６キーワード受信時刻取得部
６７キーワード受信部
６８キーワード送信部
６９映像ファイル送信時刻取得部
７０送信制御部
７１発声ファイル送信部
７２映像ファイル送信部
７３遠隔時間情報取得部
７４時間情報送信部
ｔ１送信開始時刻
ｔ２受信完了時刻
Ｔａ時間情報
Ｔｂ遠隔時間情報
Ｔ混雑度合い判定用時間情報 1 Data transmission/reception system 2 Data transmission/reception device 3 Data transmission/reception device (remote data transmission/reception device)
4 Cloud server 11 Robot (first robot)
12 people (first user)
13 Display (first display)
21 Robot (Second Robot)
22 people (second users)
23 Display (second display)
30 Internet 51 Video acquisition unit 52 Video file generation unit 53 Voice acquisition unit 54 Voice file generation unit 55 Remote voice reproduction unit (voice reproduction unit)
56 Utterance sentence generation unit 57 Template storage unit 58 Utterance sentence playback unit 59 Transmission/reception processing unit 60 Video file acquisition unit 61 Keyword transfer unit 62 Clock 63 Time calculation unit 64 Time comparison unit 65 Remote utterance file reception unit 66 Keyword reception time acquisition unit 67 Keyword reception unit 68 Keyword transmission unit 69 Video file transmission time acquisition unit 70 Transmission control unit 71 Utterance file transmission unit 72 Video file transmission unit 73 Remote time information acquisition unit 74 Time information transmission unit t1 Transmission start time t2 Reception completion time Ta Time information Tb Remote time information T Time information for congestion degree determination

Claims

A data transmission and reception system in which a data transmission and reception device provided in the first robot, a remote data transmission and reception device provided in the second robot, and a cloud server are connected via the Internet, in which a first user watching a video displayed on a first display and a first robot making an utterance based on a keyword related to the video are present in a first location, and a second user watching the video displayed on a second display and a second robot making an utterance based on the keyword related to the video are present in a second location different from the first location, and the first user and the second user listen to the utterance based on the same keyword and have a conversation,
In a case where each of the data transmission/reception device and the remote data transmission/reception device acquires a data file of the video as a video file, transmits the video file to the cloud server, and receives the keyword from the cloud server,
acquiring a data file of an utterance by the first user as a first utterance file, transmitting the first utterance file to the remote data transceiver device, and receiving a data file of an utterance by the second user from the remote data transceiver device as a second utterance file;
calculating time information Ta reflecting a degree of congestion of the Internet at the data transmission/reception device from the time when the video file is transmitted to the cloud server until the keyword is received, transmitting the time information Ta to the remote data transmission/reception device, and receiving time information Tb reflecting the degree of congestion of the Internet at the remote data transmission/reception device from the remote data transmission/reception device;
a transmission/reception processing unit that, when determining based on the time information Ta and the time information Tb that the degree of congestion of the Internet at the data transmission/reception device is higher than that at the remote data transmission/reception device, stops transmitting the video file to the cloud server and receives from the remote data transmission/reception device the keyword that the remote data transmission/reception device has received from the cloud server;
a voice reproduction unit that reproduces the second voice file received by the transmission/reception processing unit;
a speech sentence reproducing unit that reproduces a speech sentence based on the keyword received by the transmission/reception processing unit;
A data transmitting/receiving device comprising:

2. The data transmission/reception device according to claim 1,
The transmission/reception processing unit includes:
The time information Ta is subtracted from the time information Tb to obtain the time information T;
If the value of the time information T is smaller than 0, it is determined that the degree of congestion of the Internet at the data transmission/reception device is higher than that at the remote data transmission/reception device, and a first process is performed to stop transmitting the video file to the cloud server and to receive from the remote data transmission/reception device the keyword that the remote data transmission/reception device received from the cloud server;
If the value of the time information T is greater than 0, it is determined that the congestion level of the Internet in the data transmission/reception device is lower than that of the remote data transmission/reception device, and a second process is performed to acquire the video file, transmit the video file to the cloud server, and receive the keyword from the cloud server.
A data transmission/reception device characterized in that, when the value of the time information T is 0, the degree of congestion of the Internet in the data transmission/reception device is determined to be the same as that of the remote data transmission/reception device, and the first processing or the second processing is performed.

3. The data transmission/reception device according to claim 2,
The transmission/reception processing unit includes:
When the value of the time information T is smaller than 0, the data transmission/reception device determines that the degree of congestion of the Internet is higher at the data transmission/reception device than at the remote data transmission/reception device, stops transmitting the video file to the cloud server, receives the keyword that the remote data transmission/reception device received from the cloud server from the remote data transmission/reception device, waits for a predetermined time, and then acquires the video file and transmits it to the cloud server, and performs the first processing of receiving the keyword from the cloud server.

2. The data transmission/reception device according to claim 1,
The transmission/reception processing unit includes:
The time information Ta is subtracted from the time information Tb to obtain the time information T;
If the value of the time information T is smaller than 0, it is determined that the degree of congestion of the Internet at the data transmission/reception device is higher than that of the remote data transmission/reception device, and transmission of the video file to the cloud server is stopped. From the remote data transmission/reception device, the remote data transmission/reception device receives the keyword that the remote data transmission/reception device received from the cloud server, and after waiting for a time N, where N is a preset value, acquires the video file and transmits it to the cloud server, receives the keyword from the cloud server, calculates the time information Ta, and after waiting for the time N from the time the video file was acquired, receives the time information Tb from the remote data transmission/reception device, and performs a third process to obtain the time information T.
If the value of the time information T is greater than 0, it is determined that the degree of congestion of the Internet at the data transmission/reception device is lower than that of the remote data transmission/reception device, the video file is acquired and transmitted to the cloud server, the keyword is received from the cloud server, the time information Ta is calculated, and after waiting for the time N from the time when the video file was acquired, the time information Tb is received from the remote data transmission/reception device, and a fourth process is performed to determine the time information T;
A data transmission/reception device characterized in that, when the value of the time information T is 0, the degree of congestion of the Internet in the data transmission/reception device is determined to be the same as that of the remote data transmission/reception device, and the third process or the fourth process is performed.

5. The data transmission/reception device according to claim 4,
The transmission/reception processing unit includes:
A data transmission/reception device characterized in that, when the absolute value of the time information T is greater than a predetermined threshold, the time N is set to a value greater than the predetermined value, and when the absolute value of the time information T is equal to or less than the threshold, the time N is set to the predetermined value.

6. The data transmission/reception device according to claim 1,
The spoken sentence reproducing unit
A data transmission/reception device characterized in that, when starting playback of the spoken sentence, if the first vocalization file is being acquired by the transmission/reception processing unit, the device waits until the acquisition of the first vocalization file is completed, or, if the second vocalization file is being played by the vocalization playback unit, the device waits until the playback of the second vocalization file is completed, and plays the spoken sentence after waiting.

A program for causing a computer to function as a data transmission/reception device according to any one of claims 1 to 6.