JP7550335B1

JP7550335B1 - system

Info

Publication number: JP7550335B1
Application number: JP2024054015A
Authority: JP
Inventors: 裕子久保木
Original assignee: SoftBank Group Corp
Current assignee: SoftBank Group Corp
Priority date: 2023-09-20
Filing date: 2024-03-28
Publication date: 2024-09-12
Anticipated expiration: 2044-03-28

Abstract

【課題】本発明は、未登録または非通知の電話番号からの着信時に、生成系AIが応答し、通話終了後に用件を文書化して送信するシステムにおいて、家族や知人の確認手段や悪用歴のある番号の警察連携手段を提供することを課題としている。【解決手段】未登録または非通知の電話番号からの着信時に、生成系AIが最初の応答を行う応答手段と、通話終了後に用件をチャットGPTで文書化し、メッセンジャーアプリやメールで送信する送信手段と、家族や知人と名乗る場合には、専用の質問や合言葉を使用して確認する確認手段と、配送業者は、業者用の合言葉を使用して本人につながる接続手段と、通話中には、電話番号をチェックし、悪用歴のある番号の場合は警察に連携する連携手段とを含むシステム。【選択図】図１[Problem] The present invention aims to provide a means for verifying family and acquaintances, and a means for linking with the police for numbers with a history of misuse, in a system in which a generative AI answers calls from unregistered or unnotified phone numbers and documents and sends the message after the call ends.Solution: The system includes a response means in which a generative AI makes the initial response when a call is made from an unregistered or unnotified phone number, a sending means in which the message is documented in chat GPT after the call ends and sent via a messenger app or email, a confirmation means for verifying the call using a special question or password if the caller claims to be a family member or acquaintance, a connection means for the delivery company to connect to the person using a password for the delivery company, and a linking means for checking the phone number during the call and linking with the police if the number has a history of misuse. [Selected Figure] Figure 1

Description

本開示の技術は、システムに関する。 The technology disclosed herein relates to a system.

特許文献１には、少なくとも一つのプロセッサにより遂行される、ペルソナチャットボット制御方法であって、ユーザ発話を受信するステップと、前記ユーザ発話を、チャットボットのキャラクターに関する説明と関連した指示文を含むプロンプトに追加するステップと前記プロンプトをエンコードするステップと、前記エンコードしたプロンプトを言語モデルに入力して、前記ユーザ発話に応答するチャットボット発話を生成するステップ、を含む、方法が開示されている。 Patent document 1 discloses a persona chatbot control method performed by at least one processor, the method including the steps of receiving a user utterance, adding the user utterance to a prompt including a description of the chatbot character and an associated instruction sentence, encoding the prompt, and inputting the encoded prompt into a language model to generate a chatbot utterance in response to the user utterance.

特開２０２２－１８０２８２号公報JP 2022-180282 A

本発明は、未登録または非通知の電話番号からの着信時に、生成系AIが応答し、通話終了後に用件を文書化して送信するシステムにおいて、家族や知人の確認手段や悪用歴のある番号の警察連携手段を提供することを課題としている。 The objective of this invention is to provide a means of identifying family and acquaintances and a means of coordinating with the police for numbers with a history of misuse in a system in which a generative AI answers calls from unregistered or withheld phone numbers and documents and transmits the message after the call ends.

未登録または非通知の電話番号からの着信時に、生成系AIが最初の応答を行う応答手段と、通話終了後に用件をチャットGPTで文書化し、メッセンジャーアプリやメールで送信する送信手段と、家族や知人と名乗る場合には、専用の質問や合言葉を使用して確認する確認手段と、配送業者は、業者用の合言葉を使用して本人につながる接続手段と、通話中には、電話番号をチェックし、悪用歴のある番号の場合は警察に連携する連携手段とを含むシステム。 The system includes a response means whereby a generative AI responds first when a call is received from an unregistered or withheld phone number, a sending means whereby the purpose of the call is documented in chat GPT after the call is ended and sent via a messenger app or email, a verification means whereby if the caller claims to be a family member or acquaintance, they are verified using special questions or passwords, a connection means whereby the delivery company connects to the person using a company password, and a linking means whereby the phone number is checked during the call and contacted by the police if the number has a history of misuse.

第１実施形態に係るデータ処理システムの構成の一例を示す概念図である。1 is a conceptual diagram showing an example of a configuration of a data processing system according to a first embodiment. 第１実施形態に係るデータ処理装置及びスマートデバイスの要部機能の一例を示す概念図である。1 is a conceptual diagram showing an example of main functions of a data processing device and a smart device according to a first embodiment. FIG. 第２実施形態に係るデータ処理システムの構成の一例を示す概念図である。FIG. 11 is a conceptual diagram showing an example of a configuration of a data processing system according to a second embodiment. 第２実施形態に係るデータ処理装置及びスマート眼鏡の要部機能の一例を示す概念図である。FIG. 11 is a conceptual diagram showing an example of main functions of a data processing device and smart glasses according to a second embodiment. 第３実施形態に係るデータ処理システムの構成の一例を示す概念図である。FIG. 13 is a conceptual diagram showing an example of a configuration of a data processing system according to a third embodiment. 第３実施形態に係るデータ処理装置及びヘッドセット型端末の要部機能の一例を示す概念図である。FIG. 13 is a conceptual diagram showing an example of main functions of a data processing device and a headset-type terminal according to a third embodiment. 第４実施形態に係るデータ処理システムの構成の一例を示す概念図である。FIG. 13 is a conceptual diagram showing an example of the configuration of a data processing system according to a fourth embodiment. 第４実施形態に係るデータ処理装置及びロボットの要部機能の一例を示す概念図である。FIG. 13 is a conceptual diagram showing an example of main functions of a data processing device and a robot according to a fourth embodiment. 複数の感情がマッピングされる感情マップを示す。1 shows an emotion map onto which multiple emotions are mapped. 複数の感情がマッピングされる感情マップを示す。1 shows an emotion map onto which multiple emotions are mapped.

以下、添付図面に従って本開示の技術に係るシステムの実施形態の一例について説明する。 Below, an example of an embodiment of a system related to the technology disclosed herein is described with reference to the attached drawings.

先ず、以下の説明で使用される文言について説明する。 First, let us explain the terminology used in the following explanation.

以下の実施形態において、符号付きのプロセッサ（以下、単に「プロセッサ」と称する）は、１つの演算装置であってもよいし、複数の演算装置の組み合わせであってもよい。また、プロセッサは、１種類の演算装置であってもよいし、複数種類の演算装置の組み合わせであってもよい。演算装置の一例としては、ＣＰＵ（Central Processing Unit）、ＧＰＵ（Graphics Processing Unit）、ＧＰＧＰＵ（General-Purpose computing on Graphics Processing Units）、ＡＰＵ（Accelerated Processing Unit）、又はＴＰＵ（Tensor Processing Unit）等が挙げられる。 In the following embodiments, the signed processor (hereinafter simply referred to as the "processor") may be a single arithmetic device or a combination of multiple arithmetic devices. The processor may be a single type of arithmetic device or a combination of multiple types of arithmetic devices. Examples of arithmetic devices include a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), a GPGPU (General-Purpose computing on Graphics Processing Units), an APU (Accelerated Processing Unit), or a TPU (Tensor Processing Unit).

以下の実施形態において、符号付きのＲＡＭ（Random Access Memory）は、一時的に情報が格納されるメモリであり、プロセッサによってワークメモリとして用いられる。 In the following embodiments, a signed random access memory (RAM) is a memory in which information is temporarily stored and is used as a working memory by the processor.

以下の実施形態において、符号付きのストレージは、各種プログラム及び各種パラメータ等を記憶する１つ又は複数の不揮発性の記憶装置である。不揮発性の記憶装置の一例としては、フラッシュメモリ（ＳＳＤ（Solid State Drive））、磁気ディスク（例えば、ハードディスク）、又は磁気テープ等が挙げられる。 In the following embodiments, the coded storage is one or more non-volatile storage devices that store various programs, various parameters, etc. Examples of non-volatile storage devices include flash memory (SSD (Solid State Drive)), magnetic disks (e.g., hard disks), and magnetic tapes.

以下の実施形態において、符号付きの通信Ｉ／Ｆ（Interface）は、通信プロセッサ及びアンテナ等を含むインタフェースである。通信Ｉ／Ｆは、複数のコンピュータ間での通信を司る。通信Ｉ／Ｆに対して適用される通信規格の一例としては、５Ｇ（5th Generation Mobile Communication System）、Ｗｉ－Ｆｉ（登録商標）、又はＢｌｕｅｔｏｏｔｈ（登録商標）等を含む無線通信規格が挙げられる。 In the following embodiments, a communication I/F (Interface) with a code is an interface including a communication processor and an antenna, etc. The communication I/F controls communication between multiple computers. Examples of communication standards applied to the communication I/F include wireless communication standards including 5G (5th Generation Mobile Communication System), Wi-Fi (registered trademark), and Bluetooth (registered trademark).

以下の実施形態において、「Ａ及び／又はＢ」は、「Ａ及びＢのうちの少なくとも１つ」と同義である。つまり、「Ａ及び／又はＢ」は、Ａだけであってもよいし、Ｂだけであってもよいし、Ａ及びＢの組み合わせであってもよい、という意味である。また、本明細書において、３つ以上の事柄を「及び／又は」で結び付けて表現する場合も、「Ａ及び／又はＢ」と同様の考え方が適用される。 In the following embodiments, "A and/or B" is synonymous with "at least one of A and B." In other words, "A and/or B" means that it may be only A, only B, or a combination of A and B. In addition, in this specification, the same concept as "A and/or B" is also applied when three or more things are expressed by connecting them with "and/or."

［第１実施形態］
図１には、第１実施形態に係るデータ処理システム１０の構成の一例が示されている。 [First embodiment]
FIG. 1 shows an example of the configuration of a data processing system 10 according to the first embodiment.

図１に示すように、データ処理システム１０は、データ処理装置１２及びスマートデバイス１４を備えている。データ処理装置１２の一例としては、サーバが挙げられる。 As shown in FIG. 1, the data processing system 10 includes a data processing device 12 and a smart device 14. An example of the data processing device 12 is a server.

データ処理装置１２は、コンピュータ２２、データベース２４、及び通信Ｉ／Ｆ２６を備えている。コンピュータ２２は、本開示の技術に係る「コンピュータ」の一例である。コンピュータ２２は、プロセッサ２８、ＲＡＭ３０、及びストレージ３２を備えている。プロセッサ２８、ＲＡＭ３０、及びストレージ３２は、バス３４に接続されている。また、データベース２４及び通信Ｉ／Ｆ２６も、バス３４に接続されている。通信Ｉ／Ｆ２６は、ネットワーク５４に接続されている。ネットワーク５４の一例としては、ＷＡＮ（Wide Area Network）及び／又はＬＡＮ（Local Area Network）等が挙げられる。 The data processing device 12 includes a computer 22, a database 24, and a communication I/F 26. The computer 22 is an example of a "computer" according to the technology of the present disclosure. The computer 22 includes a processor 28, a RAM 30, and a storage 32. The processor 28, the RAM 30, and the storage 32 are connected to a bus 34. The database 24 and the communication I/F 26 are also connected to the bus 34. The communication I/F 26 is connected to a network 54. Examples of the network 54 include a WAN (Wide Area Network) and/or a LAN (Local Area Network).

スマートデバイス１４は、コンピュータ３６、受付装置３８、出力装置４０、カメラ４２、及び通信Ｉ／Ｆ４４を備えている。コンピュータ３６は、プロセッサ４６、ＲＡＭ４８、及びストレージ５０を備えている。プロセッサ４６、ＲＡＭ４８、及びストレージ５０は、バス５２に接続されている。また、受付装置３８、出力装置４０、及びカメラ４２も、バス５２に接続されている。 The smart device 14 includes a computer 36, a reception device 38, an output device 40, a camera 42, and a communication I/F 44. The computer 36 includes a processor 46, a RAM 48, and a storage 50. The processor 46, the RAM 48, and the storage 50 are connected to a bus 52. The reception device 38, the output device 40, and the camera 42 are also connected to the bus 52.

受付装置３８は、タッチパネル３８Ａ及びマイクロフォン３８Ｂ等を備えており、ユーザ入力を受け付ける。タッチパネル３８Ａは、指示体（例えば、ペン又は指等）の接触を検出することにより、指示体の接触によるユーザ入力を受け付ける。マイクロフォン３８Ｂは、ユーザの音声を検出することにより、音声によるユーザ入力を受け付ける。制御部４６Ａは、タッチパネル３８Ａ及びマイクロフォン３８Ｂによって受け付けたユーザ入力を示すデータをデータ処理装置１２に送信する。データ処理装置１２では、特定処理部２９０（図２参照）が、ユーザ入力を示すデータを取得する。 The reception device 38 includes a touch panel 38A and a microphone 38B, and receives user input. The touch panel 38A detects contact with an indicator (e.g., a pen or a finger) to receive user input by the touch of the indicator. The microphone 38B detects the user's voice to receive user input by voice. The control unit 46A transmits data indicating the user input received by the touch panel 38A and the microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 (see FIG. 2) acquires the data indicating the user input.

出力装置４０は、ディスプレイ４０Ａ及びスピーカ４０Ｂ等を備えており、データをユーザが知覚可能な表現形（例えば、音声及び／又はテキスト）で出力することでデータをユーザに対して提示する。ディスプレイ４０Ａは、プロセッサ４６からの指示に従ってテキスト及び画像等の可視情報を表示する。スピーカ４０Ｂは、プロセッサ４６からの指示に従って音声を出力する。カメラ４２は、レンズ、絞り、及びシャッタ等の光学系と、ＣＭＯＳ（Complementary Metal-Oxide-Semiconductor）イメージセンサ又はＣＣＤ（Charge Coupled Device）イメージセンサ等の撮像素子とが搭載された小型デジタルカメラである。 The output device 40 includes a display 40A and a speaker 40B, and presents data to the user by outputting the data in a form of expression that the user can perceive (e.g., voice and/or text). The display 40A displays visible information such as text and images according to instructions from the processor 46. The speaker 40B outputs voice according to instructions from the processor 46. The camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an imaging element such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor.

通信Ｉ／Ｆ４４は、ネットワーク５４に接続されている。通信Ｉ／Ｆ４４及び２６は、ネットワーク５４を介してプロセッサ４６とプロセッサ２８との間の各種情報の授受を司る。 The communication I/F 44 is connected to the network 54. The communication I/Fs 44 and 26 are responsible for transmitting and receiving various types of information between the processor 46 and the processor 28 via the network 54.

図２には、データ処理装置１２及びスマートデバイス１４の要部機能の一例が示されている。 Figure 2 shows an example of the main functions of the data processing device 12 and the smart device 14.

図２に示すように、データ処理装置１２では、プロセッサ２８によって特定処理が行われる。ストレージ３２には、特定処理プログラム５６が格納されている。特定処理プログラム５６は、本開示の技術に係る「プログラム」の一例である。プロセッサ２８は、ストレージ３２から特定処理プログラム５６を読み出し、読み出した特定処理プログラム５６をＲＡＭ３０上で実行する。特定処理は、プロセッサ２８がＲＡＭ３０上で実行する特定処理プログラム５６に従って特定処理部２９０として動作することによって実現される。 As shown in FIG. 2, in the data processing device 12, specific processing is performed by the processor 28. A specific processing program 56 is stored in the storage 32. The specific processing program 56 is an example of a "program" according to the technology of the present disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

ストレージ３２には、データ生成モデル５８及び感情特定モデル５９が格納されている。データ生成モデル５８及び感情特定モデル５９は、特定処理部２９０によって用いられる。 Storage 32 stores a data generation model 58 and an emotion identification model 59. Data generation model 58 and emotion identification model 59 are used by the identification processing unit 290.

スマートデバイス１４では、プロセッサ４６によって受付出力処理が行われる。ストレージ５０には、受付出力プログラム６０が格納されている。受付出力プログラム６０は、データ処理システム１０によって特定処理プログラム５６と併用される。プロセッサ４６は、ストレージ５０から受付出力プログラム６０を読み出し、読み出した受付出力プログラム６０をＲＡＭ４８上で実行する。受付出力処理は、プロセッサ４６がＲＡＭ４８上で実行する受付出力プログラム６０に従って、制御部４６Ａとして動作することによって実現される。 In the smart device 14, the reception output process is performed by the processor 46. The storage 50 stores a reception output program 60. The reception output program 60 is used in conjunction with a specific processing program 56 by the data processing system 10. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output process is realized by the processor 46 operating as the control unit 46A in accordance with the reception output program 60 executed on the RAM 48.

なお、データ処理装置１２以外の他の装置がデータ生成モデル５８を有してもよい。例えば、サーバ装置（例えば、ＣｈａｔＧＰＴサーバ）がデータ生成モデル５８を有してもよい。この場合、データ処理装置１２は、データ生成モデル５８を有するサーバ装置と通信を行うことで、データ生成モデル５８が用いられた処理結果（予測結果など）を得る。また、データ処理装置１２は、サーバ装置であってもよいし、ユーザが保有する端末装置（例えば、携帯電話、ロボット、家電）であってもよい。次に、データ処理装置１２の特定処理部２９０による特定処理について説明する。 Note that a device other than the data processing device 12 may have the data generation model 58. For example, a server device (e.g., a ChatGPT server) may have the data generation model 58. In this case, the data processing device 12 obtains a processing result (such as a prediction result) using the data generation model 58 by communicating with the server device having the data generation model 58. In addition, the data processing device 12 may be a server device, or may be a terminal device owned by a user (e.g., a mobile phone, a robot, a home appliance). Next, the identification process by the identification processing unit 290 of the data processing device 12 will be described.

（形態例１）
本発明を実施するための形態は、未登録または非通知の電話番号からの着信時に、生成系AIが応答する応答手段と、通話終了後に用件をチャットGPTで文書化し、メッセンジャーアプリやメールで送信する送信手段とを具備するシステムである。さらに、家族や知人と名乗る場合には、専用の質問や合言葉を使用して確認する確認手段と、配送業者は業者用の合言葉を使用して本人につながる接続手段と、通話中には電話番号をチェックし、悪用歴のある番号の場合は警察に連携する連携手段とを具備する。
応答手段、送信手段、確認手段、接続手段、および連携手段は、例えば、データ処理装置１２の特定処理部２９０によって実現される。また、これらの手段の一部または全部は、例えば、スマートデバイス１４の制御部４６Ａによって実現されてもよい。各手段と装置や制御部との対応関係は、上述した例に限定されず、種々の変更が可能である。
応答手段は、音声認識技術を活用した発話理解機能が含まれ、未登録または非通知の番号からの着信に対して、事前に訓練されたAIが生成した自然言語による応答を行う。このAIは、大量の通話データを学習しており、さまざまなシナリオに対応できる回答を生成する能力を有する。着信があると、AIはリアルタイムで着信内容を解析し、適切な応答を選択して会話を進める。応答内容は、通話の性質に合わせてカスタマイズされるため、営業電話からの質問には丁寧に対応し、不審な着信に対しては慎重に応答する。また、応答手段は、未登録または非通知の電話番号からの着信に対応するため、音声合成技術と発話内容理解技術を駆使し、事前に訓練を受けたAIが適切な対応を行うこともできる。このAIは複数の応答シナリオを学習し、通話の意図を把握して、状況に応じた自然な対話を提供することができる。着信内容に基づいてAIが瞬時に応答を選択し、通話の趣旨に合わせてカスタマイズされた対応を行うため、営業電話の問い合わせには適切に対処し、怪しい着信には慎重に応じることができる。
送信手段には、通話内容をテキスト化する音声認識技術と、そのテキストを基に通話の要約を生成する自然言語生成技術が含まれる。通話が終了すると、AIは通話内容をテキストデータに変換し、その内容を要約して文書化する。生成された文書は、メッセンジャーアプリやメールなどのコミュニケーション手段を介して、ユーザーに送信される。このプロセスにより、ユーザーは通話の詳細を後からでも確認でき、重要な情報を見逃すことがない。また、送信手段では、終了した通話の内容を音声認識技術によりテキストに変換し、その後、自然言語処理技術を用いて要約を行い、生成された文書をメッセンジャーアプリやメールでユーザーに送付することもできる。ユーザーは送付された文書を介して通話の詳細をいつでも確認可能で、重要な情報が記録され、見落とされることがなくなる。
確認手段は、家族や知人からの着信と名乗る場合に、特定の質問や合言葉を使用して本人確認を行う機能を有する。この手段は、ユーザーが設定した質問や合言葉をAIが使用し、着信者が正しい応答を行うことで本人であることを確認する。本人確認が成功すると、AIは通話を継続し、必要に応じてユーザーに転送する。このプロセスにより、不正アクセスや詐欺を防ぎながら、本人であれば円滑な通信を確保する。また、確認手段は、家族や知人を名乗る着信者に対して、ユーザーが設定した質問や合言葉をAIが提示し、正確な回答が得られた場合にのみ、通話を継続またはユーザーに転送することもできる。この機能により、不正なアクセスや詐欺を未然に防ぎつつ、本人であることが確認された場合にはスムーズなコミュニケーションを実現する。
接続手段は、配送業者などの特定の職種の者が業者用の合言葉を使用することで本人と通話ができる機能を有する。この手段では、AIが業者用の合言葉を受け取り、正しい合言葉であることを確認した後、通話を本人に繋ぐ。業者用の合言葉は、通話のセキュリティを確保するために重要であり、誤った合言葉が入力された場合、通話は繋がらない。また、接続手段は、配送業者などの職種固有の合言葉をAIが受け取り、その正当性を確認した上で、通話をユーザーにつなぐ機能を担うこともできる。この合言葉によって、通話のセキュリティが確保され、誤った合言葉の場合には通話が切断される仕組みとなっている。
連携手段は、通話中に電話番号をリアルタイムでチェックし、その番号に悪用歴がある場合は自動的に警察や関連機関に連携する機能を有する。この手段は、データベースに蓄積された不審な番号のリストを参照し、着信番号がそのリストに含まれているかを確認する。不審な番号からの通話であると判断された場合、システムは即座に警察への報告プロトコルを開始し、状況に応じて適切な対応を取る。また、連携手段は、通話中の番号をリアルタイムで監視し、過去に悪用された履歴がある番号であれば自動的に警察などの関連機関と連携する機能を果たすこともできる。不審な番号の検出時には、警察への通報プロトコルが起動し、迅速な対応が取られる。
これらの手段は、ユーザーのセキュリティと利便性を高めるために、複数のAI技術と連携機能を組み合わせて実装される。データ処理装置とスマートデバイスの制御部が協力し、これらの手段を柔軟に実現するためのプラットフォームが提供される。各手段は、独立して機能するだけでなく、連動してより高度なサービスを提供するために設計される。ユーザーは、これらの手段を通じて、未登録または非通知の電話番号からの着信に対する対応を自動化し、日常生活の中で発生する潜在的なリスクから自己を守ることができる。また、これらの手段は、AI技術を中心に構築され、ユーザーのセキュリティを向上させると同時に、日常のコミュニケーションを効率化する。センサーを使用しないデータ収集の代替手段として、ユーザーが手動で情報を入力し、システムがその情報を利用するプロセスが可能である。ユーザーはこれらの手段によって、未登録または非通知の着信に対する対処を自動化し、不測の事態からの保護を強化することができる。
本形態例のシステムには、さらなる機能を追加し、その利便性を高めることができる。例えば、応答手段には、会話中にユーザーの感情を解析する機能を追加し、通話相手の声のトーンや話し方から感情を推定し、それに応じた応答をAIが行う。これにより、通話相手が怒りや不安を感じている場合は、より慎重かつ共感的な応答が可能となり、通話の品質を向上させる。
送信手段には、文書化した通話内容をユーザーのカレンダーやリマインダーに自動的に統合する機能を追加する。これにより、通話で得たアポイントメントやタスクを忘れずに管理できるようになり、生産性の向上を図ることができる。
確認手段は、ユーザーの生体情報を利用して本人確認を行う機能を備える。例えば、スマートウォッチやフィットネストラッカーから得た生体情報を用いて、通話相手が実際に家族や知人であることを確認する。これにより、より高度なセキュリティを実現しつつ、利用者の手間を減らすことができる。
接続手段には、配送業者がQRコード（登録商標）またはNFCタグをスキャンすることで認証を行い、通話を本人に直接繋ぐ機能を追加する。これにより、合言葉のやり取りを省略し、より迅速かつ安全に配送業者との連携を図ることが可能となる。
連携手段では、通話中の音声データから不審なキーワードを検出し、その内容を自動で分析する機能を追加する。キーワードに基づいて不審な通話と判断された場合、警察への通報を行う前に、まずはユーザーに警告することで、より正確な判断をサポートする。
これらの追加機能により、本形態例のシステムは、通話の自動応答だけでなく、日々の生活の中でのセキュリティやスケジュール管理をより効率的にサポートする。また、各種のAI技術とデータベースの連携によって、ユーザーの生活に更なる安心と便利さを提供する。さらに、これらの技術をユーザーのスマートデバイスや家庭内のIoT機器と統合することで、よりパーソナライズされた体験を実現することが可能となる。
本形態例のシステムには、さらなる機能拡張が可能であり、利便性およびセキュリティを高めるために応答手段には、発信者の声紋を分析し、登録済みの家族や知人の声と照合する機能を追加することができる。これにより、発信者が本人であるかをより正確に判断し、特定の質問や合言葉を使用しなくても安全に通話を継続することが可能になる。
送信手段では、音声認識と自然言語処理を活用して得られた通話内容を、ユーザーが選択した複数の言語に翻訳し、異なる言語を話すユーザー間のコミュニケーションを支援する機能を組み入れることができる。これにより、国際的なビジネスシーンや多言語を話す家庭内での利用が促進される。
確認手段には、ユーザーが特定のジェスチャーや動作をカメラに示すことで本人確認を行う機能を追加することができる。この機能により、通話中に手軽にかつ迅速に本人確認を行うことが可能となり、セキュリティが一層強化される。
接続手段に関しては、AIが発信者の位置情報と配送予定情報を照合し、本人確認を行う機能を追加することで、配送業者が実際に商品を配送している場所から通話していることを確認し、通話の真正性を保証することが可能になる。
連携手段では、不審な通話が検出された際に、警察だけでなく、ユーザーの指定した緊急連絡先にも自動通報する機能を追加することで、万が一の緊急時に迅速な対応が行えるようになる。
これらの機能追加は、既存のシステムの基本的な構造を維持しつつ、ユーザーのニーズに合わせて柔軟な対応が可能となる。また、スマートデバイスや家庭内のIoT機器と連携することで、ユーザーが日常的に使用する環境においてもシームレスな経験を提供する。これらの追加機能により、本形態例のシステムは通話の自動応答を超え、日常生活のあらゆる面でユーザーのセキュリティと利便性を向上させる。
（形態例２）
本発明を実施するための形態は、通話終了後に用件をチャットGPTで文書化し、メッセンジャーアプリやメールで送信する際に、家族宛に送信可能な送信手段を具備するシステムである。具体的には、通話終了後に生成された文書を選択し、送信先を家族と指定することで、家族宛に用件を送信することができる。
送信手段は、例えば、データ処理装置１２の特定処理部２９０によって実現される。また、送信手段は、例えば、スマートデバイス１４の制御部４６Ａによって実現されてもよい。各手段と装置や制御部との対応関係は、上述した例に限定されず、種々の変更が可能である。
通話終了後の内容を文書化する手段は、音声認識技術を用いた会話内容解析手段を含む。この解析手段は、通話の音声データをテキストに変換し、通話の要点を抽出する。また、音声認識技術を用いた会話内容解析手段は、通話が終了した後に発生する音声データをテキストに変換し、そのテキストデータから通話の主要なポイントを抽出する機能を持つ。音声認識技術は、ディープラーニングに基づくモデルを活用し、さまざまな言語やアクセントに対応するためのトレーニングが行われる。また、この解析手段は、多様な言語やアクセントに対応するためにディープラーニングモデルを活用し、大量の音声データを基に学習を進める。会話内容解析手段は、形態素解析や構文解析を行い、会話の中で交わされた重要な情報や行動を要する内容を特定する。特に、キーワード抽出機能を用いて会話の中で頻繁に使われる単語やフレーズを特定し、それらを基に通話の要点を文書化する。また、会話の要点を明確にするために、形態素解析機能と構文解析機能を組み合わせ、会話中に交わされた重要な情報やアクションを要する内容を特定する。キーワード抽出機能は、会話の中で頻出する単語やフレーズを識別し、それらの情報を基に文書化する。
文書化された内容をメッセージとしてフォーマットする手段は、テキストエディタ機能を含む。この機能は、解析されたテキストを整理し、文書の構造を整える。メッセージフォーマット手段は、ユーザが容易に内容を確認し、必要に応じて編集や追加情報を加えることができるインタフェースを提供する。また、文書化された内容をメッセージ形式に整える手段には、テキストエディタ機能が含まれ、解析されたテキストを整理し、文書のレイアウトを整える。ユーザがメッセージの内容を確認し、編集や追加情報を加えることができるインターフェイスを提供する。
送信手段は、メッセージを宛先指定機能を用いて特定の家族に送信する。この宛先指定機能は、ユーザの連絡先リストと連携し、選択された家族メンバーの連絡先情報に基づいてメッセージを自動的に送信する。また、送信手段は、メッセージ送信の確認と送信履歴を管理する機能も備えており、ユーザは送信されたメッセージの状態を追跡し、必要に応じて再送信を行うことができる。メッセージの送信は、メッセンジャーアプリやメールアプリとの連携によって行われる。メッセンジャーアプリやメールアプリとの連携機能は、ユーザのアカウント情報を用いて認証を行い、安全にメッセージを送信する。また、送信手段は、セキュリティ対策としてメッセージの暗号化や送信時の認証プロセスを実施し、プライバシーの保護を確保する。また、送信手段は、ユーザが選択したメッセージ送信方法に応じて、メッセージを適切なフォーマットで送信する。例えば、メッセンジャーアプリではインスタントメッセージとして、メールアプリでは電子メールとして送信する。メッセージの送信手段は、ユーザが選択した家族メンバーに対してメッセージを送るための宛先指定機能を備えており、ユーザの連絡先リストと連携して、家族メンバーの連絡先に基づいたメッセージの自動送信を行う。送信手段には、メッセージの送信状況を確認し、送信履歴を管理する機能も含まれており、ユーザは送信したメッセージの状態を追跡し、必要に応じて再送信を行うことができる。メッセージの送信プロセスは、メッセンジャーアプリやメールアプリと連携し、ユーザのアカウント情報を基に認証を行い、安全にメッセージを送信する機能を持つ。送信手段は、メッセージの内容を暗号化し、送信時の認証を行うセキュリティ対策を施して、プライバシーを守る。また、メッセンジャーアプリではインスタントメッセージとして、メールアプリでは電子メールとして、ユーザが選択した送信方法に応じて適切なフォーマットでメッセージを送信する。
センサーを含まないデータ収集の例としては、ユーザが手動で通話内容をメモする場合が考えられる。この場合、ユーザは通話終了後に自分で通話の要点を記録し、そのテキストデータをメッセージとして家族に送信する。ユーザが手動で記録した通話内容は、テキストエディタ機能を用いて整理され、フォーマットされたメッセージとして送信手段によって家族宛に送信される。この手動記録は、音声認識技術を用いた自動文書化が適用できない状況や、ユーザが特定の情報を自らの言葉で伝えたい場合に適している。また、センサーを用いないデータ収集の方法として、ユーザが通話終了後に手動で通話内容をメモし、その情報をメッセージとして家族に送るシナリオも考えられる。この手動記録は、テキストエディタ機能を用いて整理され、フォーマットされたメッセージとして送信される。音声認識技術が適用できない状況や、ユーザが自らの言葉で特定の情報を伝えたい場合に利用される。
本発明の実施形態では、通話内容の文書化を超えた機能を考慮することができる。例えば、音声認識によって生成されたテキストに基づき、スケジュール管理システムと連携して、通話中に言及されたアポイントメントや予定を自動的にカレンダーに登録する機能を追加する。これにより、ユーザーは通話後に手動でスケジュールを管理する手間を省くことができる。さらに、家族間で共有されるカレンダーへの予定登録を提案し、家族全員が予定を共有しやすくする。
また、文書化されたテキストデータを基に、自動的にタスクリストを生成し、家族全員がアクセスできる共有プラットフォームに投稿する機能を設けることも可能である。このプラットフォームでは、各家族メンバーがタスクの進捗を更新したり、完了したタスクにチェックを入れることができ、家族全員で情報を共有し協力する環境を構築する。
さらに、文書化されたメッセージに対して、感情分析を行い、通話中の感情的なニュアンスをテキストに反映させる機能を追加することで、メッセージの受取人が発信者の意図をより正確に理解することを助ける。例えば、通話中に喜びや心配といった感情が表れた場合、その感情をテキストに特定の絵文字やフォーマットで表現し、コミュニケーションの豊かさを高める。
また、音声認識と解析を活用して、通話内容から自動的にFAQやよくある質問リストを生成し、家族が同様の問い合わせをする際に参照できる知識ベースを構築する機能も考えられる。この知識ベースは、家族内で共有され、新たな通話が発生するたびに更新されることで、家族間のコミュニケーションの効率を向上させる。
さらに、通話終了後に生成されるテキストは、メッセンジャーアプリやメールで送信するだけでなく、音声形式で再生する機能を付加することで、視覚障害のある家族メンバーや読み書きが苦手な子供でも情報を容易に受け取れるようにする。
最後に、通話終了後に文書化された内容を、家族メンバーのプライバシーを保護するために、文書内の機微な情報を識別し、自動的に匿名化や伏せ字処理を行う機能を組み込むことで、安心して情報を共有できる環境を提供する。これにより、個々のプライバシーを尊重しつつ、必要な情報のみを共有するバランスを保つことができる。
本発明の実施形態は、通話内容の自動文書化と送信に関するものであり、これに新たな機能を追加することが考えられる。例えば、通話内容を分析し、通話終了後に自動でアクションアイテムを生成し、対応が必要なタスクとしてユーザーのスマートデバイスにリマインダーをセットする機能が考えられる。このリマインダーは、通話で言及された期限や重要性に基づいて優先度を設定し、ユーザーが忘れずに行動に移せるようサポートする。
さらに、家族間でのコミュニケーションを強化するために、文書化されたメッセージ内の特定の単語やフレーズに基づいて、関連する画像やビデオ、リンクを自動的に添付する機能を追加することも有益である。これにより、テキストベースのメッセージだけでなく、視覚的な情報も共有でき、コミュニケーションがより豊かになる。
また、通話内容の文書化に際して、プライバシーに配慮し、特定の個人情報や機密情報を自動的に検出し、ブラー処理や伏せ字に変換する機能を実装することで、セキュリティを高めることができる。このプロセスは、自然言語処理技術とプライバシー保護のガイドラインに従って行われる。
通話内容のテキスト化では、ユーザーの多様なニーズに対応するため、複数の言語への翻訳機能を組み込むことも有効である。家族が異なる言語を話す多文化の環境では、通話内容を自動的に翻訳し、各メンバーが理解しやすい言語でメッセージを送信することが可能となる。
さらなる利便性を追求するために、メッセージの送信タイミングをユーザーがカスタマイズできるスケジュール機能を追加する。ユーザーは、即時送信だけでなく、特定の日時にメッセージを送信するよう設定できるため、家族が情報を受け取るタイミングを最適化できる。
最後に、メッセージの受け取り側で、受信したメッセージに対するアクションを簡単にとれるよう、返信や確認のためのクイックアクションボタンを設けることで、迅速なフィードバックと効率的なコミュニケーションを実現する。これにより、家族間での情報共有がさらにスムーズに行われるようになる。
（形態例３）
本発明を実施するための形態は、通話中に電話番号をチェックし、悪用歴のある番号の場合には警察に連携する際に、警察との連携手段を具備するシステムである。具体的には、通話中に着信番号をデータベースと照合し、悪用歴のある番号であることを検出した場合には、自動的に警察に通報する機能を備えている。
連携手段は、例えば、データ処理装置１２の特定処理部２９０によって実現される。また、連携手段は、例えば、スマートデバイス１４の制御部４６Ａによって実現されてもよい。各手段と装置や制御部との対応関係は、上述した例に限定されず、種々の変更が可能である。
抽出手段は、通信網を介して発信される各通話の信号に含まれる発信者情報を抽出するために備わっており、デジタル信号処理技術を用いて通話データから発信者の電話番号を正確に取得することができる。また、抽出手段は、Caller ID情報を解析し、電話番号を特定するための信号解析機能を持っている。このシステムは、通話信号の中から発信者の電話番号を抽出するための信号抽出機能と、Caller ID情報の解読に特化した解析アルゴリズムを利用して、着信番号を正確に特定することもできる。
照合手段は、抽出された電話番号を不正利用が疑われる電話番号を収集し、カテゴリ別に整理したデータベースに照合する機能を有しており、データベース管理システムを通じてリアルタイムでの照合処理が可能であり、高速な検索アルゴリズムとインデックス技術により、通話が進行している間に迅速な照合が行われる。また、照合手段は、悪用歴のある番号をリスト化したデータベースを参照し、迅速な検索と照合を行うための高性能なデータベース検索機能とインデックス機能を備えている。
連携手段は、悪用歴のある番号が検出された場合に自動的に警察に通報する機能を持ち、通報システムとのインターフェイス機能が含まれ、通報する際の警察の受付システムとのプロトコルに基づいたデータ形式で通報情報を生成し、安全な通信チャネルを用いて警察の受付システムに送信する。通報情報の内容には、検出された悪用歴のある電話番号、通話の日時、通話の持続時間、発信者が利用している通信事業者などの情報が含まれ、個人情報の保護や通報の正確性を確保するために、暗号化技術や認証システムが用いられる。また、連携手段は、複数の通信プロトコルや通報システムとの互換性を持ち、システム間のデータ交換を円滑に行うためのアダプター機能を設け、通報のプロセスにおいて、警察の受付システムの要件に合わせて通報情報のフォーマットを調整し、適切な通報プロトコルを選択して通報を行う機能を持つ。通報プロセスが発動されると、システムは警察の受付システムに対して、通報情報を送信し、通報の受付確認を取得する。この確認は、通報が正しく行われたことをシステムが記録し、通報履歴として保存するための情報として利用される。通報履歴は、将来的な分析や改善のために用いられ、通報プロセスの効率化や精度向上に寄与する。この連携機能は、通報データ生成機能により、必要な情報を含む通報フォーマットを作成し、セキュアな通報送信機能を介して警察の受付システムへ情報を送信する。通報のセキュリティと信頼性を保証するため、データの暗号化機能とシステム認証機能が導入されている。また、システムの連携手段は、複数の通報プロトコルと互換性を持ち、異なる通報システムとのデータ交換を実現するアダプター機能を有しており、このアダプター機能は、通報プロトコル選択機能により、通報時のプロトコル要件に適した形式に自動的に調整し、通報情報の送信と受付確認を行う。通報履歴記録機能は通報の成功を記録し、システムのパフォーマンス分析や改善に使用される。
データ収集手段には、センサーを用いない例として、ユーザがアプリケーションやウェブインターフェース上で疑わしい通話に関する報告を行う機能があり、ユーザは、通話の経験や通話中に感じた不審な点をフォームに入力し、その情報がデータベースに登録される。この手動報告により収集されたデータは、自動的にデータベースに照合される電話番号のリストに追加される可能性があり、悪用歴のある番号の検出精度の向上に寄与する。また、センサーを使用しないデータ収集例として、ユーザが疑わしい通話について報告するための入力機能が提供される。ユーザはインタラクティブな報告フォームを通じて、疑わしい通話の内容をデータベースに登録し、これにより収集された情報は悪用歴のある番号の検出に活用される。この手動報告システムは、ユーザの経験と感覚に基づいて追加データを提供し、番号照合データベースの拡張に貢献する。
このシステムには、データベースの更新メカニズムを強化する機能を追加することができる。例えば、新たに悪用が確認された番号は、通報後も自動的にデータベースに追加される。さらに、疑わしい通話が報告された際には、その番号の信用情報を他のデータベースとも照合し、ユーザーからの報告に基づく情報と組み合わせることで、より正確な悪用歴の特定を実現する。データベースの整合性を保つために、定期的なクリーニングプロセスを実行し、誤った情報や古いデータを排除する仕組みも設けられる。また、通報システムとの連携を強化するため、警察が提供する犯罪データベースと直接連携し、照合プロセス中にリアルタイムで犯罪情報を取得し、照合結果の精度を向上させる機能も導入される。
通報の即時性を高めるために、通話が開始された瞬間に照合プロセスが開始され、悪用歴のある番号が検出された場合、通話者に警告音を出すか、自動的に通話を遮断するオプションも設けられる。さらに、通話を遮断した際には、通報者に代わって通話内容の録音を保存し、警察の調査に役立てることができる。警察が介入する際には、通報者の位置情報や通話履歴を含む詳細なレポートが自動生成され、犯罪捜査の迅速化を支援する。
ユーザインターフェースには、通報システムの透明性を高めるために、通報プロセスの進行状況をリアルタイムで確認できる機能が追加される。通報の結果や警察からのフィードバックをユーザが確認できるようにすることで、システムへの信頼性を向上させる。また、悪用歴のある番号に関する統計データやトレンド分析を提供し、ユーザが通話に対する警戒心を持つための情報提供も行われる。
さらに、悪用歴のある番号を特定するための機械学習技術を導入し、通話パターンや通話の頻度などの様々な指標を分析することで、悪用の可能性が高い新たな番号を予測する。これにより、データベースの予防的な更新が可能となり、未知の犯罪行為を防ぐための対策を強化する。また、ユーザが通報システムの効果について直接フィードバックを提供できる機能を設け、システムの改善に役立てる。フィードバックは匿名で行われることで、ユーザのプライバシーを保護しつつ、システムの改善に資する貴重な情報を収集する。
通話中の電話番号チェックをより効果的にするためには、ユーザーが直面する可能性のある様々な詐欺のパターンをAIが学習し、特定の単語やフレーズが通話中に検出された際にリアルタイムでフラグを立てる機能を実装する。これにより、単に電話番号がデータベース内の悪用歴と一致するだけでなく、通話の内容からも悪意を持った行動を推測し、検出することが可能になる。また、ユーザーが詐欺を疑う通話を簡単に報告できるショートカットやボタンをスマートフォンのインターフェースに設け、報告プロセスを簡略化する。これにより、データベースはより迅速に更新され、他のユーザーに対する保護が向上する。
警察との連携を強化するためには、通報された情報を基に警察が迅速に対応できるよう、通報システムに位置情報追跡機能を統合し、犯罪者の追跡と捕捉を支援する。また、通報システムに組み込まれる人工知能は、通報データから犯罪パターンを分析し、予防的な警戒活動を計画するための情報を警察に提供する。このような予測分析を活用することで、将来的な犯罪を未然に防ぐことに繋がる。
警察とのデータ共有を促進するために、警察が把握している詐欺事件やその他の犯罪に関する情報をリアルタイムで受け取り、データベースを更新する機能を設ける。これにより、通報システムは最新の犯罪情報に基づいて機能し、ユーザーを守るための対策が強化される。さらに、システムにはブロックリスト機能を追加し、ユーザーが自身で疑わしい番号を登録して通話を拒否できるようにする。これにより、ユーザー自身が直接リスクをコントロールすることが可能になる。
教育プログラムとして、ユーザーが詐欺の手口を認識し、予防するための情報を提供するオンライン講座やワークショップを開催する。これにより、ユーザーは自分自身を守るための知識を得ることができ、社会全体のセキュリティ意識が向上する。また、通報システムの利用によって防がれた詐欺事件の事例を共有し、ユーザーがシステムの実効性を理解しやすくする。
最後に、システムのアップデートを通じて、通話が詐欺である可能性が高いと判断された場合に、ユーザーに自動的に警告メッセージを送信し、詐欺に対する警戒を促す機能を実装する。これにより、ユーザーは即座に詐欺である可能性を認識し、適切な対応を取ることができるようになる。 (Example 1)
The embodiment of the present invention is a system that includes a response means in which a generative AI responds to an incoming call from an unregistered or withheld phone number, and a sending means that documents the matter in chat GPT after the call ends and sends it via a messenger app or email. In addition, the system includes a confirmation means that uses a special question or password to confirm if the caller claims to be a family member or acquaintance, a connection means that connects the delivery company to the person using a password for the delivery company, and a linking means that checks the phone number during the call and links to the police if the number has a history of abuse.
The response means, the transmission means, the confirmation means, the connection means, and the linking means are realized, for example, by the specific processing unit 290 of the data processing device 12. In addition, some or all of these means may be realized, for example, by the control unit 46A of the smart device 14. The correspondence between each means and the device or control unit is not limited to the above example, and various changes are possible.
The response means includes a speech understanding function that utilizes voice recognition technology, and responds to calls from unregistered or withheld numbers in natural language generated by a pre-trained AI. This AI has learned a large amount of call data and has the ability to generate responses that can respond to various scenarios. When a call is received, the AI analyzes the content of the call in real time and selects an appropriate response to proceed with the conversation. The response content is customized to the nature of the call, so questions from sales calls are answered politely and suspicious calls are answered carefully. In addition, the response means can also use speech synthesis technology and speech content understanding technology to respond to calls from unregistered or withheld phone numbers, and a pre-trained AI can respond appropriately. This AI can learn multiple response scenarios, understand the intention of the call, and provide a natural dialogue according to the situation. The AI instantly selects a response based on the content of the call and responds customized to the purpose of the call, so inquiries from sales calls can be handled appropriately and suspicious calls can be answered carefully.
The transmission means includes voice recognition technology that converts the contents of the call into text and natural language generation technology that generates a summary of the call based on that text. When the call ends, the AI converts the contents of the call into text data and summarizes and documents the contents. The generated document is sent to the user via a communication means such as a messenger app or email. This process allows the user to check the details of the call at a later date and ensures that important information is not overlooked. The transmission means can also convert the contents of the completed call into text using voice recognition technology, then summarize it using natural language processing technology, and send the generated document to the user via a messenger app or email. The user can check the details of the call at any time through the sent document, and important information is recorded and will not be overlooked.
The verification means has the function of verifying the identity of the caller by using specific questions or passwords when the caller claims to be a family member or acquaintance. In this method, the AI uses questions and passwords set by the user, and verifies the identity of the caller by providing the correct response. If identity verification is successful, the AI continues the call and transfers the call to the user if necessary. This process prevents unauthorized access and fraud while ensuring smooth communication if the caller is the real person. The verification means can also have the AI present questions and passwords set by the user to callers claiming to be family members or acquaintances, and continue the call or transfer the call to the user only if an accurate answer is given. This function prevents unauthorized access and fraud, while enabling smooth communication if the caller is confirmed to be the real person.
The connection means has a function that allows a person in a specific occupation, such as a delivery worker, to talk to the person by using a secret code for the occupation. With this method, AI receives the secret code for the occupation, verifies that it is the correct code, and then connects the call to the person. The secret code for the occupation is important for ensuring the security of the call, and if an incorrect code is entered, the call will not be connected. The connection means can also have a function where AI receives a secret code specific to the occupation, such as a delivery worker, verifies its validity, and then connects the call to the user. The security of the call is ensured by this secret code, and if the incorrect code is entered, the call will be disconnected.
The linking means has a function of checking the phone number in real time during a call, and automatically linking to the police or related organizations if the number has a history of misuse. This means refers to a list of suspicious numbers stored in a database, and checks whether the incoming number is included in the list. If it is determined that the call is from a suspicious number, the system immediately starts a reporting protocol to the police, and takes appropriate action depending on the situation. The linking means can also monitor the number being called in real time, and automatically link to related organizations such as the police if the number has a history of misuse in the past. When a suspicious number is detected, a reporting protocol to the police is activated, and a prompt response is taken.
These measures are implemented by combining multiple AI technologies and collaboration functions to enhance user security and convenience. The data processing device and the control unit of the smart device cooperate to provide a platform for flexibly realizing these measures. Each measure is designed not only to function independently but also to work together to provide more advanced services. Through these measures, users can automate responses to calls from unregistered or unnotified phone numbers and protect themselves from potential risks that occur in daily life. In addition, these measures are built around AI technology to improve user security while simultaneously streamlining daily communication. As an alternative to sensor-free data collection, a process in which users manually enter information and the system uses that information is possible. Through these measures, users can automate responses to unregistered or unnotified calls and strengthen protection from unforeseen events.
Further functions can be added to the system of this embodiment to enhance its convenience. For example, the response means can be added with a function to analyze the user's emotions during a conversation, and the AI can infer the emotions from the tone of the other party's voice and manner of speaking and respond accordingly. This allows for a more careful and empathetic response when the other party is feeling angry or anxious, improving the quality of the call.
The sending method will add a feature that automatically integrates documented calls into users' calendars and reminders, helping them remember appointments and tasks generated during calls and improving productivity.
The verification method has a function to verify the identity of the user using biometric information. For example, biometric information obtained from a smartwatch or fitness tracker can be used to verify that the person on the other end of the line is actually a family member or acquaintance. This reduces the hassle for users while achieving a higher level of security.
The connection method will add a function that allows the delivery company to scan a QR code (registered trademark) or NFC tag to authenticate the user and connect the call directly to the user, eliminating the need for a password and making it possible to communicate with the delivery company more quickly and safely.
The collaboration will add a function to detect suspicious keywords from voice data during phone calls and automatically analyze the content. If a call is deemed suspicious based on the keywords, the system will first warn the user before reporting it to the police, helping them make more accurate decisions.
With these additional functions, the system of this embodiment not only automatically answers calls, but also more efficiently supports security and schedule management in daily life. In addition, by linking various AI technologies with databases, it provides greater security and convenience to the user's life. Furthermore, by integrating these technologies with the user's smart devices and IoT devices in the home, it becomes possible to realize a more personalized experience.
The system of this embodiment can be further expanded, and in order to increase convenience and security, the response means can be added with a function to analyze the caller's voiceprint and compare it with the voices of registered family members and acquaintances. This makes it possible to more accurately determine whether the caller is the real person and to continue the call safely without using specific questions or passwords.
The transmission means can incorporate a function that uses voice recognition and natural language processing to translate the contents of the call into multiple languages selected by the user, facilitating communication between users who speak different languages, facilitating use in international business situations and within multilingual households.
The verification method can include a feature that allows users to verify their identity by making specific gestures or movements on the camera, making it possible to easily and quickly verify the identity of a user during a call, further enhancing security.
Regarding the means of connection, AI will be able to add a function to verify the identity of the caller by comparing their location information with delivery schedule information, making it possible to confirm that the delivery company is calling from a location where the goods are actually being delivered, thereby guaranteeing the authenticity of the call.
The collaboration method will add a function that automatically notifies not only the police but also the user's designated emergency contact when a suspicious call is detected, enabling a rapid response in the event of an emergency.
These additional functions allow for flexible response to user needs while maintaining the basic structure of the existing system. In addition, by linking with smart devices and IoT devices in the home, the system provides a seamless experience in the environment in which the user uses the system on a daily basis. With these additional functions, the system of this embodiment goes beyond automatic answering of calls, improving the security and convenience of users in all aspects of their daily lives.
(Example 2)
The embodiment of the present invention is a system that has a sending means that can send the matter to family members when documenting the matter in chat GPT after the call ends and sending it by messenger app or email. Specifically, the matter can be sent to family members by selecting the document created after the call ends and specifying "family members" as the destination.
The transmission means is realized, for example, by the specific processing unit 290 of the data processing device 12. The transmission means may also be realized, for example, by the control unit 46A of the smart device 14. The correspondence between each means and the device or control unit is not limited to the above example, and various modifications are possible.
The means for documenting the contents after the end of the call includes a conversation content analysis means using a voice recognition technology. This analysis means converts the voice data of the call into text and extracts the main points of the call. In addition, the conversation content analysis means using the voice recognition technology has a function of converting the voice data generated after the end of the call into text and extracting the main points of the call from the text data. The voice recognition technology utilizes a model based on deep learning and is trained to support various languages and accents. In addition, in order to support various languages and accents, this analysis means utilizes a deep learning model and proceeds with learning based on a large amount of voice data. The conversation content analysis means performs morphological analysis and syntactic analysis to identify important information exchanged in the conversation and content requiring action. In particular, a keyword extraction function is used to identify words and phrases frequently used in the conversation, and the main points of the call are documented based on these. In addition, in order to clarify the main points of the conversation, a morphological analysis function and a syntactic analysis function are combined to identify important information exchanged during the conversation and content requiring action. The keyword extraction function identifies words and phrases that appear frequently in the conversation, and documents based on this information.
The means for formatting the documented content into a message includes a text editor function, which organizes the parsed text and arranges the document structure. The message formatting means provides an interface that allows a user to easily check the content and add edits or additional information as necessary. The means for formatting the documented content into a message also includes a text editor function, which organizes the parsed text and arranges the document layout. The means for formatting the message provides an interface that allows a user to check the content of the message and add edits or additional information.
The sending means sends a message to a specific family member using a destination designation function. This destination designation function works in conjunction with the user's contact list to automatically send a message based on the contact information of the selected family member. The sending means also has a function of confirming message transmission and managing a transmission history, so that the user can track the status of the sent message and resend it as necessary. The message is sent in conjunction with a messenger app or an email app. The linking function with the messenger app or the email app performs authentication using the user's account information and transmits the message safely. The sending means also encrypts the message as a security measure and performs an authentication process at the time of transmission to ensure privacy protection. The sending means also transmits the message in an appropriate format depending on the message transmission method selected by the user. For example, the message is sent as an instant message in the messenger app, and as an email in the email app. The message sending means has a destination designation function for sending a message to a family member selected by the user, and works in conjunction with the user's contact list to automatically send a message based on the contact information of the family member. The sending means also includes a function for checking the sending status of a message and managing the sending history, so that the user can track the status of the sent message and resend it if necessary. The message sending process has a function for linking with a messenger application or an email application, authenticating the user based on the account information, and sending the message securely. The sending means protects privacy by encrypting the contents of the message and implementing security measures such as authentication at the time of sending. In addition, the message is sent in an appropriate format according to the sending method selected by the user, such as an instant message in the messenger application or an email in the email application.
An example of data collection that does not include sensors is when a user manually takes notes on a phone call. In this case, the user records the main points of the call after the call ends, and sends the text data to the family as a message. The call contents manually recorded by the user are organized using a text editor function and sent to the family as a formatted message by the sending means. This manual recording is suitable for situations where automatic documentation using voice recognition technology cannot be applied, or when the user wants to convey specific information in his or her own words. Another possible method of data collection that does not include sensors is a scenario in which a user manually takes notes on a phone call after the call ends, and sends the information to the family as a message. This manual recording is organized using a text editor function and sent as a formatted message. This is used in situations where voice recognition technology cannot be applied, or when the user wants to convey specific information in his or her own words.
In an embodiment of the present invention, functions beyond documenting the contents of a call can be considered. For example, a function can be added that automatically registers appointments and events mentioned during a call in a calendar in cooperation with a schedule management system based on the text generated by speech recognition. This can save the user the trouble of manually managing the schedule after the call. In addition, it can suggest adding events to a calendar shared among family members, making it easier for all family members to share the schedule.
It is also possible to automatically generate task lists based on documented text data and post them to a shared platform that can be accessed by all family members, where each family member can update the progress of tasks and check off completed tasks, creating an environment for the whole family to share information and cooperate.
In addition, the company will add a feature that performs sentiment analysis on written messages and reflects the emotional nuances expressed during a call in the text, helping message recipients to more accurately understand the caller's intent. For example, if emotions such as joy or anxiety are expressed during a call, those emotions will be expressed in the text with specific emojis and formats, enhancing the richness of communication.
Another possible function would be to use voice recognition and analysis to automatically generate FAQs and a list of frequently asked questions from the content of calls, building a knowledge base that family members can refer to when they have similar inquiries. This knowledge base would be shared among family members and updated every time a new call occurs, improving the efficiency of communication between family members.
In addition, the text generated after the call ends can be sent not only via messenger apps or email, but also played back in audio format, making it easier for visually impaired family members or children with difficulty reading and writing to receive the information.
Finally, to protect the privacy of family members, the documented content after the call is completed can be automatically anonymized or masked to identify sensitive information in the document, providing an environment in which information can be shared with peace of mind. This allows for a balance between respecting each individual's privacy and sharing only the information that is necessary.
The embodiment of the present invention relates to automatic documentation and transmission of call contents, and new functions can be added to the document. For example, the document can be analyzed, and action items can be automatically generated after the call ends, and a reminder can be set on the user's smart device as a task that needs to be addressed. The reminder can be prioritized based on deadlines and importance mentioned in the call, helping the user to take action without forgetting.
Additionally, to enhance communication among family members, it would be beneficial to add the ability to automatically attach relevant images, videos, and links based on specific words or phrases in a written message, allowing for visual information to be shared in addition to text-based messages, making communication richer.
In addition, the system can enhance security by automatically detecting and blurring certain personal or confidential information when documenting call transcripts, in accordance with privacy protection guidelines and natural language processing techniques.
When converting call contents into text, it is also effective to incorporate a translation function into multiple languages to meet the diverse needs of users. In a multicultural environment where family members speak different languages, it is possible to automatically translate the call contents and send messages in a language that each member can easily understand.
To further enhance convenience, a schedule function will be added that allows users to customize the timing of message sending. Users can set messages to be sent at a specific date and time, rather than just immediately, allowing them to optimize the timing at which their family members receive information.
Finally, quick action buttons for replying or confirming messages have been provided so that recipients can easily take action on received messages, enabling quick feedback and efficient communication, making information sharing between family members even smoother.
(Example 3)
The embodiment of the present invention is a system that checks phone numbers during a call, and has a means for linking with the police when linking with the police if the number has a history of misuse. Specifically, the system is equipped with a function that checks the incoming number against a database during a call, and automatically notifies the police if it detects that the number has a history of misuse.
The linking means is realized, for example, by the specific processing unit 290 of the data processing device 12. The linking means may also be realized, for example, by the control unit 46A of the smart device 14. The correspondence between each means and the device or control unit is not limited to the above example, and various modifications are possible.
The extraction means is provided for extracting caller information contained in the signal of each call sent through the communication network, and can accurately obtain the caller's telephone number from the call data using digital signal processing technology. The extraction means also has a signal analysis function for analyzing Caller ID information and identifying the telephone number. The system can also accurately identify the called number by utilizing the signal extraction function for extracting the caller's telephone number from the call signal and an analysis algorithm specialized in decoding Caller ID information.
The matching means has a function of matching the extracted telephone number with a database that collects telephone numbers suspected of fraudulent use and organizes them by category, and the matching process can be performed in real time through a database management system, and rapid matching can be performed while the call is in progress using high-speed search algorithms and indexing technology.The matching means also has a high-performance database search function and indexing function for rapid search and matching by referring to a database that lists numbers with a history of misuse.
The linking means has a function of automatically reporting to the police when a number with a history of abuse is detected, includes an interface function with the reporting system, generates report information in a data format based on the protocol with the police reception system when reporting, and transmits it to the police reception system using a secure communication channel. The contents of the report information include information such as the detected abused phone number, the date and time of the call, the duration of the call, and the telecommunications carrier used by the caller, and uses encryption technology and authentication systems to protect personal information and ensure the accuracy of the report. In addition, the linking means is compatible with multiple communication protocols and reporting systems, has an adapter function for smooth data exchange between the systems, and has a function of adjusting the format of the report information to meet the requirements of the police reception system in the reporting process, and selecting an appropriate reporting protocol to make the report. When the reporting process is activated, the system transmits report information to the police reception system and obtains a report reception confirmation. This confirmation is used as information for the system to record that the report was made correctly and store it as a report history. The report history is used for future analysis and improvement, contributing to the efficiency and accuracy of the report process. This linking function uses a report data generation function to create a report format containing the necessary information, and transmits the information to the police reception system via a secure report transmission function. Data encryption and system authentication functions are implemented to ensure the security and reliability of reports. In addition, the system's linking means has an adapter function that is compatible with multiple report protocols and enables data exchange with different report systems, and this adapter function automatically adjusts to a format suitable for the protocol requirements at the time of reporting through a report protocol selection function, and transmits report information and confirms receipt. The report history recording function records the success of reports, which is used to analyze and improve system performance.
As an example of data collection without sensors, a function for users to report suspicious calls on an application or web interface is provided. The user enters their experience of the call and any suspicious points they noticed during the call into a form, and the information is registered in a database. The data collected through this manual report can be added to a list of phone numbers that are automatically matched in the database, contributing to improved accuracy in detecting numbers with a history of abuse. As an example of data collection without sensors, an input function is provided for users to report suspicious calls. Through an interactive reporting form, users register the contents of suspicious calls in a database, and the information collected is used to detect numbers with a history of abuse. This manual reporting system contributes to the expansion of the number matching database by providing additional data based on the user's experience and intuition.
The system can be equipped with a function to strengthen the database update mechanism. For example, newly confirmed abused numbers will be automatically added to the database even after they are reported. In addition, when a suspicious call is reported, the credit information of the number will be checked against other databases and combined with information based on user reports to more accurately identify abuse history. To ensure the integrity of the database, a regular cleaning process will be carried out to eliminate incorrect and outdated information. In addition, to strengthen cooperation with the reporting system, a function will be introduced to directly link with the crime database provided by the police, obtain crime information in real time during the matching process, and improve the accuracy of the matching results.
To improve the immediacy of reports, the matching process begins the moment the call is initiated, and if a number with a history of abuse is detected, the caller will be given the option to sound an alarm or automatically hang up. In addition, when the call is hung up, a recording of the call can be saved on the caller's behalf to assist police investigations. When police intervene, a detailed report including the caller's location and call history is automatically generated, helping to speed up criminal investigations.
The user interface will be updated to include a feature that allows users to check the progress of the reporting process in real time to increase transparency in the reporting system. Users will be able to check the results of their reports and feedback from the police, which will increase trust in the system. The system will also provide statistical data and trend analysis on abused numbers, providing information to users to be cautious about calls.
Furthermore, machine learning technology will be introduced to identify numbers with a history of abuse, and new numbers likely to be abused will be predicted by analyzing various indicators such as call patterns and call frequency. This will enable proactive updates to the database, strengthening measures to prevent unknown criminal activity. A function will also be added that allows users to provide direct feedback on the effectiveness of the reporting system, which will help improve the system. Feedback will be provided anonymously, protecting user privacy while collecting valuable information that will contribute to improving the system.
To make the phone number check during a call more effective, AI will learn the patterns of various fraudulent scams that users may face and implement a function to flag in real time when certain words or phrases are detected during a call. This will allow malicious behavior to be inferred and detected from the content of the call, rather than simply matching the phone number with abuse history in the database. In addition, the reporting process will be simplified by providing shortcuts and buttons on the smartphone interface that allow users to easily report calls that they suspect are fraudulent. This will allow the database to be updated more quickly and improve protection for other users.
To strengthen cooperation with the police, the reporting system will be integrated with location tracking capabilities to help police respond quickly based on reported information, helping them track and capture criminals. Artificial intelligence will also be built into the reporting system to analyze crime patterns from report data and provide police with information to plan preventive vigilance activities. Using such predictive analytics will help prevent future crimes.
To facilitate data sharing with the police, the system will receive real-time information on fraud cases and other crimes known to the police and update the database. This will ensure that the reporting system operates based on the latest crime information and strengthen measures to protect users. In addition, the system will be equipped with a block list function, allowing users to register suspicious numbers and block calls from them, allowing users to directly control the risks themselves.
Educational programs will include online courses and workshops to provide users with information to recognize and prevent fraud methods. This will provide users with the knowledge to protect themselves and raise security awareness in society as a whole. Examples of fraud cases that were prevented by using the reporting system will also be shared to help users understand the effectiveness of the system.
Finally, through a system update, if a call is deemed likely to be fraudulent, a warning message will be automatically sent to the user to warn them against fraud. This will allow users to immediately recognize the possibility of fraud and take appropriate action.

以下に、各形態例の処理の流れについて説明する。 The process flow for each example is explained below.

（形態例１）
ステップ１：未登録または非通知の電話番号からの着信があった場合、生成系AIが応答する。
ステップ２：通話終了後、用件をチャットGPTで文書化する。
ステップ３：文書化された用件をメッセンジャーアプリやメールで送信する。家族や知人と名乗る場合には、専用の質問や合言葉を使用して確認する。配送業者は業者用の合言葉を使用して本人につながる。通話中には電話番号をチェックし、悪用歴のある番号の場合は警察に連携する。
（形態例２）
ステップ１：通話終了後、用件をチャットGPTで文書化する。
ステップ２：文書化された用件を選択し、送信先を家族と指定する。
ステップ３：家族宛に用件をメッセンジャーアプリやメールで送信する。
（形態例３）
ステップ１：通話中に着信番号をデータベースと照合し、悪用歴のある番号であることを検出する。
ステップ２：悪用歴のある番号である場合、自動的に警察に通報する。
ステップ３：通話終了後、用件をチャットGPTで文書化する。文書化された用件は、警察との連携に使用される。 (Example 1)
Step 1: When a call comes in from an unregistered or withheld phone number, the generative AI answers.
Step 2: After the call, document the matters discussed in Chat GPT.
Step 3: Send a written message via messenger app or email. If the caller claims to be a family member or acquaintance, verify the call using special questions or a secret code. The delivery company will connect you using a secret code. Check the phone number during the call, and if the number has a history of abuse, contact the police.
(Example 2)
Step 1: After the call, document the matters you wish to discuss in Chat GPT.
Step 2: Select the documented matter and designate it as being sent to family members.
Step 3: Send your message to your family via messenger app or email.
(Example 3)
Step 1: During a call, the incoming number is checked against a database to detect if it is a number with a history of abuse.
Step 2: If the number has a history of abuse, the call will automatically be reported to the police.
Step 3: After the call is over, the matter is documented in the chat GPT. The documented matter is used for communication with the police.

更に、ユーザの感情を推定する感情エンジンを組み合わせてもよい。すなわち、特定処理部２９０は、感情特定モデル５９を用いてユーザの感情を推定し、ユーザの感情を用いた特定処理を行うようにしてもよい。 Furthermore, an emotion engine that estimates the user's emotion may be combined. That is, the identification processing unit 290 may estimate the user's emotion using the emotion identification model 59, and perform identification processing using the user's emotion.

（形態例１）
本発明を実施するための形態は、未登録または非通知の電話番号からの着信時に、生成系AIが応答する応答手段と、通話終了後に用件をチャットGPTで文書化し、メッセンジャーアプリやメールで送信する送信手段とを具備するシステムである。さらに、ユーザの感情を認識する感情エンジンを組み合わせる感情認識手段として、通話中にユーザの声のトーンや言葉の選択などを分析し、感情を推定する。推定された感情に基づいて、生成系AIの応答や文書化された用件の表現を調整する。例えば、ユーザが不安な感情を示している場合には、より穏やかな表現を使用することで安心感を与える。
応答手段および送信手段は、例えば、データ処理装置１２の特定処理部２９０によって実現される。感情認識手段は、例えば、スマートデバイス１４のマイクロフォン３８Ｂを用いてユーザの声のトーンを検出し、制御部４６Ａによって分析を行い、データ処理装置１２の特定処理部２９０によって感情を推定する。また、応答手段、送信手段、および感情認識手段の一部または全部は、例えば、スマートデバイス１４の制御部４６Ａによって実現されてもよい。各手段と装置や制御部との対応関係は、上述した例に限定されず、種々の変更が可能である。
応答手段は、自然言語処理と音声合成を組み合わせた対話生成手段が含まれる。この対話生成手段は、未登録または非通知の着信に対して自動で応答を行い、ユーザとの対話を可能にする。未登録または非通知の着信を検出すると、生成系AIは事前に訓練された会話モデルを用いて応答する。この会話モデルは、様々なシナリオに対応できるように多様な対話データを基に学習しており、着信者の質問や要求に対して適切な回答や案内を提供する。また、応答手段は、AIによる対話生成機能を備え、未登録または非通知の電話番号からの着信に対して、自動的に適切な返答を行うことができる。この機能は、通話の初期段階で着信者の目的や要望を理解し、対応する応答を行うための対話管理機能と、生成された応答を自然な音声に変換することもできるための音声生成機能を組み合わせたものである。対話管理機能は、特定のキーワードやフレーズの検出に基づいて着信者の意図を分析し、適切な返答を生成することができる。音声生成機能は、テキストベースの応答をリアルタイムに音声に変換し、着信者に対して自然な会話体験を提供することができる。
送信手段には、チャットボットや自然言語理解技術を用いた文書化手段が含まれる。通話終了後、チャットGPTのような高度な自然言語理解モデルを使用して、通話内容を精確に文書化する。文書化手段は、通話の要点を抽出し、要約する能力を有しており、用件を簡潔かつ明瞭に伝える文書を生成することができる。生成された文書は、メッセンジャーアプリやメールを介してユーザに送信される。この過程には、ユーザのメールアドレスやメッセンジャーアカウントへの連携機能が含まれ、文書は適切な形式で自動的に送付される。また、送信手段は、通話内容をテキスト化し、これをユーザがアクセス可能な形で提供することができる。通話が終了すると、通話内容文書化機能が活動し、会話の主要なポイントを抽出し、要約することができる。この要約されたテキストは、自動配信機能を通じてユーザ指定のメールアドレスやメッセンジャーアプリに送信される。この自動配信機能には、文書を適切なフォーマットで整え、指定された送信先に確実に届けるためのメール送信機能やアプリ連携機能が含まれる。
感情認識手段には、音声分析を行う声紋解析手段と、言葉の選択から感情を推定する言語解析手段が含まれる。声紋解析手段は、スマートデバイスのマイクロフォンを利用してユーザの声のトーン、ピッチ、速度などの特徴を検出し、それらの声の特性からユーザの感情状態を推定することができる。言語解析手段は、ユーザの発言の内容を解析し、使用される言葉やフレーズから感情的なコンテキストを抽出することができる。これらの分析結果は、生成系AIが応答を行う際や、文書化された用件の表現を調整する際に使用される。例えば、ユーザが不安を示している場合、応答や文書の表現はより穏やかで安心感を与えるように調整される。また、感情認識手段は、通話中のユーザの声の特徴と言語使用から感情を推定する機能を有する。声紋分析機能は、マイクロフォンによって収集された音声データから、声の高低、強弱、速度などの特徴を抽出し、これらの特性を解析することで感情を推定することができる。言語感情分析機能は、通話中の言語データを処理し、使用される単語やフレーズが持つ感情的な意味を解析し、ユーザの感情状態を把握することができる。これらの分析結果は、AIが行う応答のトーンや、文書化された通話内容の表現を調整する際に利用され、ユーザに対して適切な感情的対応を提供することができる。
センサーを含まないデータ収集手段としては、ユーザが自身で入力するテキストデータや、システム利用に関するフィードバックが挙げられる。これらは、ユーザ入力受付機能やフィードバック収集機能を通じてシステムに提供され、サービス改善のための貴重な情報源として活用される。
これらの手段は、ユーザの要求に迅速かつ効果的に対応し、コミュニケーションの質を向上させることを目的としている。また、各手段の実装は、データ処理装置やスマートデバイスの制御部によって柔軟に行われ、システムの効率性とユーザビリティを高めるために様々な形で変更が可能となっている。
本発明のシステムは、追加機能として、未登録または非通知の着信に対して、応答前に通話者の意図を推測するための概要予測手段を備えることができる。これにより、応答手段がより精度の高い対話を生成し、ユーザーにとって有意義なやり取りが実現する。また、生成系AIが応答する際には、通話者の国や地域に基づいた言語選択機能を持たせ、多言語対応の自動応答が可能となる。
送信手段に関しては、通話内容の文書化に加えて、重要なキーワードやフレーズのハイライト機能を設けることで、ユーザーが文書を素早く把握できるようにする。さらに、文書化された内容に基づいて自動的にアクションアイテムを生成し、ユーザーのタスクリストに追加する機能を追加することができる。
感情認識手段においては、通話中にユーザーの感情が変化した場合、その変化をリアルタイムで検知し、応答手段の対話のトーンやテンポを動的に調整する機能を持たせることができる。また、特定の感情が検出された場合には、それに応じた特別なサポートやアドバイスを提供する専門家への連絡を促すプロトコルも組み込むことができる。
応答手段には、着信者の過去の通話履歴や関連データを分析し、より個人化された応答を提供するパーソナライゼーション機能を追加することができる。これにより、ユーザーにとってより関連性が高い情報を提供し、応答の有用性を高めることが可能となる。
送信手段に関しては、文書化された通話内容に基づいてフォローアップのアクションを提案する機能を追加することができる。例えば、通話内容に含まれるタスクや予定に対して、カレンダーアプリへの自動登録機能を統合することで、ユーザーの時間管理をサポートする。
さらに、感情認識手段は、ユーザーのストレスレベルや緊張感を検知し、適宜、ストレス軽減のためのアドバイスやリラクゼーションコンテンツへのリンクを提供する機能を備えることができる。これにより、ユーザーの精神的な健康を支援し、総合的なウェルビーイングを促進することができる。
本形態例のシステムには、さらなる機能向上を図るための複数の追加機能が考慮される。例えば、未登録または非通知の着信に対して、通話者の声紋を分析し、以前の通話データと照合することにより、通話者の身元を特定する声紋認識手段を追加することができる。これにより、通話者が過去にシステムとやり取りしたことがある場合、その情報を基に応答手段がより適切な対応を行うことが可能となる。また、声紋認識手段は、セキュリティ対策としても機能し、ユーザーに対する信頼性の高い通話体験を提供する。
送信手段についても、通話内容を文書化する際に、通話の内容を構造化し、情報の重要度に応じてテキストの階層化を行う機能を考慮する。これにより、ユーザーは文書を読む際に重要な情報をより迅速に把握できるようになる。さらに、文書化された内容をユーザーの好みや過去の行動パターンに合わせてカスタマイズすることで、より個人的な体験を提供することが可能となる。
感情認識手段では、通話中にユーザーのストレスレベルを検知し、ストレスが高いと推定される場合には、通話内容に関連したリラクゼーション方法や心理的サポートへの案内を提供する。これにより、ユーザーが通話を通じてリラックスし、ストレスを軽減できるようなサービスを提供する。
さらに、応答手段には、通話内容に基づいてユーザーへのフォローアップアクションを自動的に提案する機能を追加する。例えば、通話中に提案された製品やサービスに関する追加情報へのリンクを提供したり、次の行動ステップを提案することで、ユーザーの意思決定をサポートする。
応答手段の改善としては、通話者の意図に応じて自動的に応答スタイルを変更する機能を検討する。たとえば、通話者が緊急の状況を示している場合には、迅速かつ的確な指示を提供するようにAIを調整する。このような対応により、通話者のニーズに即応できるようなシステムを実現する。
これらの追加機能は、ユーザーエクスペリエンスの向上を目指すとともに、通話内容の正確な把握と迅速な対応を可能にするためのものである。また、それぞれの機能は、データ処理装置やスマートデバイスの制御部の能力を最大限に活用し、システムの有用性をさらに高めることが期待される。
（形態例２）
本発明を実施するための形態は、通話終了後に用件をチャットGPTで文書化し、メッセンジャーアプリやメールで送信する際に、ユーザの感情を認識する感情エンジンを使用して文書の表現を調整する調整手段とを具備するシステムである。具体的には、文書化された用件を感情エンジンに入力し、ユーザの感情を分析する。分析結果に基づいて、文書の表現を適切に調整する。例えば、ユーザが喜びや興奮を感じている場合には、より明るく活気のある表現を使用することで、ユーザの感情を共有する。
調整手段は、例えば、データ処理装置１２の特定処理部２９０によって実現される。また、調整手段の一部または全部は、例えば、スマートデバイス１４の制御部４６Ａによって実現されてもよい。各手段と装置や制御部との対応関係は、上述した例に限定されず、種々の変更が可能である。
会話内容抽出手段は、音声認識技術を駆使して通話内容をテキストデータに変換する。また、会話内容抽出手段は、音声信号を受信した後、ノイズリダクション手段を用いて背景雑音を除去し、音声からテキストへの変換精度を向上させる。さらに、会話内容抽出手段は、最先端の音声認識技術を用いて、通話内容を精確にテキストデータへ変換し、その過程でノイズ除去手段が背景雑音を除去し、変換精度を向上させる。
テキスト処理手段は、変換されたテキストデータの構文上の誤りを修正し、言語の流暢さを保つために文法検査手段を介して文法チェックを行う。また、テキスト処理手段は、テキスト化されたデータの文法検査手段が語法の正確性を保証し、文書の自然な流れを維持するための調整を行う。
感情認識手段には、テキストマイニングと感情分析技術に基づく感情抽出手段が含まれ、生成されたテキストデータの言葉遣いや文脈からユーザの感情を推定する。また、感情抽出手段は、様々な感情を表す単語やフレーズ、文法的パターンを識別し、それらをポジティブ、ネガティブ、ニュートラルなどの感情カテゴリに分類する。また、感情強調手段は、抽出された感情に応じてテキストのトーンや言い回しを調整し、ユーザの感情状態をより適切に伝えるための修正を行う。さらに、感情抽出手段は、文書に表れる言語パターンからユーザの感情を読み取り、それを基に文書のトーンを調整する。
調整手段は、感情認識手段によって分析された感情データを基に、テキストの表現を変化させる。また、表現強化手段は、喜びや興奮などのポジティブな感情が検出された場合、使われる語彙をより明るく活気のあるものに置き換え、メッセージに好意的な印象を与える。さらに、表現緩和手段は、悲しみや怒りなどのネガティブな感情が検出された場合、メッセージのトーンを穏やかにし、共感と理解を示すような言い回しを選択する。また、ユーザの感情がポジティブな場合は、表現強化手段が文書に活力を与え、ネガティブな感情を示す場合は、表現緩和手段によって、より穏やかな表現を使用する。
メッセージの送信手段には、メッセンジャーアプリやメールクライアントとの連携機能が含まれ、調整されたテキストを適切な形式で送信する。また、送信プロトコル選定手段は、受信者のプラットフォームや設定に合わせて最適な送信プロトコルを選択し、メッセージの配信を保証する。また、ユーザインタフェース提示手段は、送信前に文書の最終レビューを行うためのプレビュー画面を提供し、ユーザが必要に応じて最終的な修正を加えることができるようにする。さらに、メッセージ送信手段がメッセンジャーアプリやメールクライアントに適したフォーマットで調整されたテキストを送信し、プレビュー画面がユーザが文書を最終確認するためのインターフェースを提供する。
以上のプロセスは、ユーザの使用するデバイスや設定に応じて、スマートデバイスの制御部やデータ処理装置に内蔵された特定処理部で実現される。また、これらの手段は、モジュール化されたコンポーネントとして設計され、システムの構成要素としての交換や拡張が容易に行われる。さらに、各手段の対応関係はフレキシブルに設定されており、システムのアップグレードやカスタマイズに対応するための多様な変更が可能である。また、この一連のプロセスは、デバイスや環境に応じて柔軟に対応できるようにモジュール化されており、システムのアップグレードやカスタマイズが容易に行えるように設計されている。
この形態例を更に拡張して、ユーザーの感情をより深く理解し、コミュニケーションの質を高める機能を追加することができる。例えば、感情エンジンにビデオチャット中の表情認識機能を組み込むことで、視覚的な情報からも感情を分析し、より正確な感情判断を行う。感情認識の精度を向上させるために、ユーザーの声のトーンやピッチの分析も行い、テキストに反映させることが可能である。さらに、ユーザーの過去のコミュニケーション履歴や反応パターンを分析することで、個人の感情表現スタイルを学習し、それに応じたよりパーソナライズされたテキスト調整を実現する。
テキスト処理手段は、表現の多様性と創造性を高めるために、文学作品や詩などからインスピレーションを得た言い回しを提案する機能を持つ。これにより、ユーザーの感情がより豊かに表現される。また、社会的コンテキストや文化的背景を考慮し、コミュニケーションが行われる環境や状況に合わせた適切な表現を選択することもできる。
メッセージ送信手段には、送信されるテキストが受け手の感情に与える影響を予測する機能を追加し、ユーザーがより責任を持ってコミュニケーションを取れるようにする。さらに、受信者の反応をAIが予測し、その情報をもとにユーザーが次に取るべきコミュニケーション戦略を提案することも可能である。
全体として、これらの機能は、ユーザーが感情を的確かつ敏感にコミュニケーションに反映させることをサポートし、より深い人間関係の構築に貢献する。また、これらの進化した手段は、個人だけでなく、企業のカスタマーサポートやCRMシステムにおいても、顧客との関係を深めるために有効活用できる。さらに、これらのシステムは、ユーザー教育やカウンセリングといった人間の感情が重要な役割を果たす分野での応用が期待される。
本システムは、感情エンジンを活用してユーザーの感情に応じた文書の表現調整を提供する。この機能を拡張するために、ユーザーの生体情報を取得するセンサーを統合し、心拍数や皮膚の導電率などの生理的反応から感情をより正確に読み取ることができる。センサーからのデータはリアルタイムで分析され、文書のトーンを即座に調整することが可能となる。
さらに、ユーザーの日常的なコミュニケーションを継続的に分析し、その人固有の表現スタイルや好みを把握する個性化学習機能を搭載する。この機能により、システムはユーザーの個性を反映したより自然で個別化された文書の提案が可能になる。
また、マルチリンガル対応を強化し、さまざまな言語での感情的ニュアンスを捉えることができるようにする。この機能により、国際的なコミュニケーションや多言語を話すユーザー間での理解を深めることができる。
ユーザーのプライバシー保護のために、感情データの匿名化や暗号化を行い、セキュリティを強化する機能も追加する。これにより、ユーザーは安心してシステムを利用できるようになる。
教育やカウンセリングの分野での応用を目指し、感情認識の結果を活用してコミュニケーションスキルのトレーニングをサポートする機能を開発する。トレーニングプログラムには、感情表現の練習や、適切なコミュニケーション手法の学習が含まれる。
最後に、システムのユーザビリティを向上させるために、ユーザーインターフェースをリッチかつ直感的なものにする。さまざまなジェスチャーや音声コマンドをサポートし、ユーザーが文書の調整プロセスに容易に介入できるようにする。これにより、ユーザーは自分の意志で表現を微調整し、より個人的なコミュニケーションを実現できる。
（形態例３）
本発明を実施するための形態は、通話中にユーザの感情を認識する感情エンジンを使用して、応答や対話の内容をユーザの感情に合わせて調整する調整手段とを具備するシステムである。具体的には、通話中にユーザの声のトーンや言葉の選択などを感情エンジンに入力し、ユーザの感情を推定する。推定された感情に基づいて、生成系AIの応答や対話の内容を調整する。例えば、ユーザが悲しい感情を示している場合には、共感の言葉を用いて励ましや支援を提供する。
調整手段は、例えば、スマートデバイス１４のマイクロフォン３８Ｂを用いてユーザの声のトーンを検出し、制御部４６Ａによって分析を行い、データ処理装置１２の特定処理部２９０によって感情を推定する。また、調整手段の一部または全部は、例えば、データ処理装置１２の特定処理部２９０によって実現されてもよい。各手段と装置や制御部との対応関係は、上述した例に限定されず、種々の変更が可能である。
音声データ収集手段は、高感度でノイズキャンセリング機能を備えたマイクロフォンが含まれる。このマイクロフォンは、周囲の雑音を除去し、ユーザの声を明瞭に録音するためのデジタル信号処理アルゴリズムを搭載している。また、音声データ収集手段は、ユーザの発話から微細な音響特性を把握する高感度マイクロフォンを採用し、これにデジタル信号処理技術を組み合わせて周囲の雑音を効果的に除去する。このマイクロフォンは、ユーザの声の特性を正確に捉え、感情の変化を検出するための基礎データを提供する。
音声特徴抽出手段は、音声信号から人間の感情を反映する可能性のある特徴量を抽出する。この抽出手段は、音響特徴分析を行うためのスペクトログラム解析機能やピッチ追跡機能、音響モデルを用いた感情識別機能が含まれる。また、音声特徴抽出手段は、スペクトル分析機能やピッチ解析機能といった音響解析ツールを用いて、音声信号から感情を示唆する特徴量を抽出し、これらのデータを感情推定モデルへと供給する。
感情分析手段には、機械学習に基づいた感情推定モデルが含まれ、音声特徴抽出手段によって抽出された特徴量を入力として、ユーザの感情状態を推定する。感情推定モデルは、トレーニングデータに基づいて訓練されたニューラルネットワーク、サポートベクターマシン、決定木などの分類器から構成され、ユーザの感情をポジティブ、ネガティブ、中立などのカテゴリに分類する。感情推定モデルは、継続的な学習を通じてその精度を向上させ、ユーザの感情に対する認識の微妙な変化にも対応できるように進化する。また、感情分析手段は、機械学習技術を活用した感情推定モデルを有し、抽出された音声特徴を元にユーザの感情を分析する。このモデルは、様々な機械学習アルゴリズムを組み合わせて、ユーザの発話から感情カテゴリーを識別し、これに基づいてユーザの感情状態を推定する。推定された感情状態は、ユーザの発話や振る舞いに対するシステムの反応を調整するための重要な情報となる。
対話調整手段には、生成系AIモデルを用いた応答生成機能が含まれ、感情分析手段によって推定された感情に応じて、対話の内容を動的に調整する。応答生成機能は、自然言語生成技術を駆使し、ユーザの感情に適した言葉選びやトーンを用いて応答文を生成する。例えば、ユーザが悲しい感情を示している場合、共感表現や慰めの言葉を含む応答が生成される。この応答生成機能は、会話の文脈を考慮し、ユーザの感情とコミュニケーションの目的に合致した内容を提供するためのコンテキストアウェア処理機能を備える。また、生成された応答は、ユーザにとって自然であり、感情的なニーズを満たすように構築される。応答生成機能は、大規模な会話データセットに基づいて訓練された機械学習モデルによって実現され、ユーザの言葉遣いや話し方に適応することができる。また、対話調整手段には、生成系AIモデルを用いた応答生成機能が含まれ、推定された感情に適切に応じた対話内容を生成する。この機能は、ユーザの感情に対応する言葉選びや対話トーンを選定し、ユーザの現在の感情や会話の文脈に合わせた反応を提供する。応答生成機能は、コンテキストアウェアな処理を行い、ユーザが必要とする情報やサポートを適切な形で提供するために設計されている。この機能は、大量の会話データから学習されたモデルを基に、ユーザの言葉遣いや情緒に適応した応答を生成することが可能である。
センサーを含まないデータ収集手段としては、ユーザがシステムに直接入力したテキスト情報や、通話記録から得られるメタデータが考えられる。これらは、ユーザの行動パターンや好みを分析する際の補足情報として利用され、感情エンジンの精度向上や応答生成機能の最適化に寄与する。ユーザが入力するテキスト情報は、感情分析手段の一部として、感情推定モデルの訓練データとしても活用される。
本システムは、通話中にユーザの感情をリアルタイムで分析し、対話内容を自動調整する能力を有しているため、さらに細かな感情の変化を捉えるために、音声データに加えて、表情認識技術を統合することができる。ウェブカメラやスマートデバイスのカメラを活用して、ユーザの顔の表情を分析し、感情分析の精度を向上させる。この追加された表情認識機能は、ユーザの感情をより正確に認識し、さらに微細な感情変化に応じた対話調整が可能となる。
また、ユーザの生理的シグナルを捉えるためのウェアラブルデバイスを組み込むことも考えられる。心拍数や皮膚電気活動など、生理的反応を測定することで、声のトーンや表情だけでは捉えきれない感情の深層を解析する。これらのデータを統合することで、感情分析の精度はさらに向上し、より適切な対話応答を生成することができる。
対話調整手段には、ユーザの文化的背景や個人的な価値観を考慮したカスタマイズ機能を追加することも有効である。ユーザプロファイルを構築し、その情報に基づいて、対話のトーンや内容をさらにパーソナライズする。これにより、ユーザ一人ひとりに合わせたきめ細やかなサポートを提供することが可能となる。
さらに、感情推定モデルの進化を促すために、クラウドソーシングによる感情データの収集や、多様なユーザからのフィードバックを取り入れて、モデルを継続的にアップデートする仕組みを構築する。これにより、多様な感情表現や言語に対応できる柔軟なシステムとなる。
また、教育やメンタルヘルスケアの分野における応用も検討することができる。例えば、教育分野では、学生の感情に適応した教材の提示やカウンセリングセッションでの使用が考えられる。メンタルヘルスケアでは、ユーザの感情を認識し、ストレスや不安を軽減するための対話支援を行う。これにより、ユーザが抱える問題に対してより効果的なアプローチが可能となる。
システムのプライバシー保護に関しても、ユーザの感情データを安全に保管し、適切なアクセス制御と暗号化技術を用いて、情報漏洩のリスクを最小限に抑えるためのセキュリティ対策を強化する。これにより、ユーザは安心してシステムを利用することができる。
最終的には、このシステムが提供するパーソナライズされた対話体験が、ユーザの生活の質を向上させるようなサービスへと発展することが期待される。
本発明の形態は、通話のみならず、ビデオ会議やオンライン教育のプラットフォームにも適用可能である。例えば、講師が生徒の感情をリアルタイムで把握し、カリキュラムの進行を感情に合わせて調整することで、より効果的な学習経験を提供する。ビデオ会議においても、参加者の感情を反映した対話管理が行われ、生産的かつポジティブな会議環境を促進する。
また、本システムには、感情反応に基づいた健康状態の監視機能を追加することも可能である。例えば、ユーザの声のトーンや話し方が一定期間にわたってネガティブな感情を示している場合、メンタルヘルスの専門家に通知を送り、必要に応じた介入を促すことができる。
さらに、感情エンジンの高度化に向け、ユーザの日常生活における感情パターンを分析し、その情報を元に長期的な感情管理やストレス軽減のアドバイスを提供する機能を組み込む。ユーザの生活リズムや活動パターンを分析し、感情の波を予測することで、適切なタイミングでリラクゼーションやモチベーション向上のためのコンテンツを提案する。
このシステムは、カスタマーサポートの分野でも応用が期待される。例えば、コールセンターのオペレーターが顧客の感情をリアルタイムで把握し、不満や怒りなどのネガティブな感情を検出した際には、即座に対応策を講じ、顧客満足度の向上に寄与する。
また、ゲームやエンターテインメントの分野でも、ユーザの感情に応じてコンテンツを動的に変化させることで、没入感や楽しさを増幅させる効果が期待される。ゲーム内のキャラクターがプレイヤーの感情に反応し、ストーリー展開や対話内容が変化することで、よりパーソナライズされた体験を実現する。
さらに、音声アシスタントや仮想現実（VR）との統合を図り、ユーザの感情に対してより自然な対話を実現する。音声アシスタントはユーザの感情を把握し、個々のニーズに合わせた情報やサービスを提供する。VR環境では、ユーザの感情に応じてシナリオや環境が変化し、リアルタイムで感情に合わせた体験を提供する。
本システムは、ユーザインタフェース（UI）やユーザーエクスペリエンス（UX）の設計においても革新をもたらす可能性を秘めている。感情認識技術を利用して、ユーザの感情に最適化されたUIやUXを提供し、利用者の満足度を高める。例えば、ウェブサイトやアプリケーションがユーザの感情をリアルタイムで把握し、コンテンツの提示方法やインタラクションの形式を調整する。
最終的には、このシステムが提供する感情調整機能が、人間関係の質を向上させ、コミュニケーションの効果を高めるツールとして社会に広く浸透していくことが期待される。 (Example 1)
The embodiment of the present invention is a system that includes a response means in which a generative AI responds when a call is received from an unregistered or unnotified phone number, and a transmission means in which the matter is documented in chat GPT after the call is ended and sent via a messenger app or email. Furthermore, as an emotion recognition means that combines an emotion engine that recognizes the user's emotions, the tone of the user's voice and choice of words are analyzed during the call to estimate the emotion. Based on the estimated emotion, the generative AI's response and the expression of the documented matter are adjusted. For example, if the user shows an anxious emotion, a calmer expression is used to provide a sense of security.
The response means and the transmission means are realized, for example, by the specific processing unit 290 of the data processing device 12. The emotion recognition means detects the tone of the user's voice using, for example, the microphone 38B of the smart device 14, analyzes it using the control unit 46A, and estimates the emotion using the specific processing unit 290 of the data processing device 12. In addition, a part or all of the response means, the transmission means, and the emotion recognition means may be realized, for example, by the control unit 46A of the smart device 14. The correspondence between each means and the device or control unit is not limited to the above example, and various changes are possible.
The response means includes a dialogue generation means that combines natural language processing and speech synthesis. The dialogue generation means automatically responds to unregistered or unnamed incoming calls, enabling a dialogue with the user. When an unregistered or unnamed incoming call is detected, the generative AI responds using a pre-trained conversation model. This conversation model is trained based on a variety of dialogue data so that it can handle various scenarios, and provides appropriate answers and guidance to the callee's questions and requests. In addition, the response means has an AI dialogue generation function, and can automatically provide an appropriate response to an incoming call from an unregistered or unnamed phone number. This function combines a dialogue management function for understanding the callee's purpose and request at the early stage of the call and providing a corresponding response, and a voice generation function for converting the generated response into a natural voice. The dialogue management function can analyze the callee's intention based on the detection of specific keywords and phrases and generate an appropriate response. The voice generation function can convert a text-based response into voice in real time, providing the callee with a natural conversation experience.
The transmission means includes a chatbot and a documentation means using natural language understanding technology. After the call is ended, the contents of the call are accurately documented using an advanced natural language understanding model such as chat GPT. The documentation means has the ability to extract and summarize the main points of the call, and can generate a document that conveys the purpose concisely and clearly. The generated document is sent to the user via a messenger app or email. This process includes a link to the user's email address or messenger account, and the document is automatically sent in the appropriate format. The transmission means can also convert the contents of the call into text and provide it in an accessible form to the user. After the call is ended, the call content documentation function is activated and can extract and summarize the main points of the conversation. This summarized text is sent to the user's designated email address or messenger app through an automatic delivery function. This automatic delivery function includes an email delivery function and an app link function to properly format the document and ensure that it is delivered to the designated destination.
The emotion recognition means includes a voiceprint analysis means for performing voice analysis, and a language analysis means for estimating emotions from the choice of words. The voiceprint analysis means can detect characteristics of the user's voice, such as tone, pitch, and speed, using the microphone of the smart device, and can estimate the user's emotional state from these voice characteristics. The language analysis means can analyze the content of the user's speech and extract emotional context from the words and phrases used. These analysis results are used by the generative AI when making a response or adjusting the expression of documented matters. For example, if the user shows anxiety, the expression of the response or document is adjusted to be more gentle and reassuring. In addition, the emotion recognition means has a function of estimating emotions from the characteristics of the user's voice and language use during a call. The voiceprint analysis function can extract characteristics such as the pitch, strength, and speed of the voice from the voice data collected by the microphone, and can estimate emotions by analyzing these characteristics. The language emotion analysis function can process the language data during a call, analyze the emotional meaning of the words and phrases used, and grasp the user's emotional state. These analytics can be used to adjust the tone of the AI's responses and the wording of documented calls to provide an appropriate emotional response to the user.
Data collection methods that do not involve sensors include text data entered by users themselves and feedback on system usage. These are provided to the system through the user input acceptance function and feedback collection function, and are used as a valuable source of information for service improvement.
These means are intended to respond quickly and effectively to user requests and improve the quality of communication. In addition, the implementation of each means is flexibly performed by the data processing device and the control unit of the smart device, and can be modified in various ways to improve the efficiency and usability of the system.
The system of the present invention can be equipped with an additional function of a summary prediction means for predicting the caller's intention before answering an unregistered or unnamed call. This allows the response means to generate a more accurate dialogue, realizing a meaningful exchange for the user. In addition, when the generative AI responds, it can have a language selection function based on the caller's country or region, enabling automatic response in multiple languages.
As for the delivery method, in addition to documenting the contents of the call, it can also highlight important keywords and phrases to help users quickly understand the document, and automatically generate action items based on the documented content and add them to the user's task list.
The emotion recognition means can detect changes in the user's emotions during a call in real time and dynamically adjust the tone and tempo of the conversation of the response means. Protocols can also be built in that, when a particular emotion is detected, prompt the user to contact an expert who can provide special support or advice according to the emotion.
The response tool can be enhanced with a personalization feature that analyzes the caller's past call history and related data to provide a more personalized response, making it possible to provide more relevant information to the user and increase the usefulness of the response.
Regarding sending methods, a function can be added that suggests follow-up actions based on the documented content of the call. For example, tasks and events included in the call content can be automatically registered in a calendar app to help users manage their time.
Furthermore, the emotion recognition means may be capable of detecting the user's stress level or tension and providing appropriate advice on how to reduce stress or links to relaxation content, thereby supporting the user's mental health and promoting overall well-being.
The system of this embodiment is considered to have a number of additional functions for further improving its functionality. For example, a voiceprint recognition means can be added to identify a caller by analyzing the caller's voiceprint and comparing it with previous call data for an unregistered or unnamed call. This allows the response means to respond more appropriately based on information if the caller has previously interacted with the system. The voiceprint recognition means also functions as a security measure, providing a reliable call experience for the user.
Regarding the means of transmission, when documenting the contents of the call, the function of structuring the contents of the call and classifying the text according to the importance of the information will be considered. This will allow users to grasp important information more quickly when reading the document. In addition, the documented content can be customized according to the user's preferences and past behavior patterns to provide a more personal experience.
The emotion recognition means detects the user's stress level during a call, and if it is estimated that the user is under high stress, it provides guidance on relaxation methods and psychological support related to the content of the call, thereby providing a service that allows users to relax and reduce stress through the call.
Additionally, the response tool will automatically suggest follow-up actions to users based on the content of the call, for example providing links to additional information about products or services suggested during the call or suggesting next steps of action to help users make decisions.
To improve response methods, we are considering a function that automatically changes the response style according to the caller's intentions. For example, if the caller indicates an emergency situation, we will adjust the AI to provide quick and accurate instructions. This will create a system that can immediately respond to the caller's needs.
These additional functions are intended to improve the user experience and enable accurate understanding of call content and rapid response. Each function is expected to maximize the capabilities of data processing devices and smart device control units, further enhancing the usability of the system.
(Example 2)
The embodiment of the present invention is a system that includes an adjustment means for documenting the matter in chat GPT after the call ends, and adjusting the expression of the document using an emotion engine that recognizes the user's emotions when sending it by messenger app or email. Specifically, the documented matter is input to the emotion engine, and the user's emotions are analyzed. Based on the analysis results, the expression of the document is appropriately adjusted. For example, if the user is feeling happy or excited, the user's emotions are shared by using brighter and more lively expressions.
The adjustment means is realized, for example, by the specific processing unit 290 of the data processing device 12. In addition, a part or all of the adjustment means may be realized, for example, by the control unit 46A of the smart device 14. The correspondence between each means and the device or control unit is not limited to the above example, and various changes are possible.
The conversation content extraction means converts the contents of the call into text data by making full use of voice recognition technology. After receiving the voice signal, the conversation content extraction means uses noise reduction means to remove background noise and improve the accuracy of the voice-to-text conversion. Furthermore, the conversation content extraction means uses cutting-edge voice recognition technology to accurately convert the contents of the call into text data, and in the process, the noise reduction means removes background noise and improves the conversion accuracy.
The text processing means corrects syntactic errors in the converted text data and performs grammar checking via the grammar checking means to maintain linguistic fluency. The text processing means also performs adjustments to ensure that the grammar checking means of the converted text data ensures accuracy of grammar and maintains the natural flow of the document.
The emotion recognition means includes an emotion extraction means based on text mining and emotion analysis techniques, which estimates the user's emotion from the wording and context of the generated text data. The emotion extraction means also identifies words, phrases, and grammatical patterns that express various emotions and classifies them into emotion categories such as positive, negative, and neutral. The emotion emphasis means also adjusts the tone and phrasing of the text according to the extracted emotions, making modifications to better convey the user's emotional state. The emotion extraction means also reads the user's emotion from the language patterns expressed in the document and adjusts the tone of the document based on the emotion.
The adjustment means changes the expression of the text based on the emotion data analyzed by the emotion recognition means. When a positive emotion such as joy or excitement is detected, the expression enhancement means replaces the vocabulary used with brighter and more lively words to give the message a favorable impression. When a negative emotion such as sadness or anger is detected, the expression mitigation means softens the tone of the message and selects phrases that show empathy and understanding. When the user's emotion is positive, the expression enhancement means energizes the document, and when the user's emotion shows negative emotion, the expression mitigation means uses a more gentle expression.
The message sending means includes a function for linking with a messenger application or an email client and sends the adjusted text in an appropriate format. Furthermore, the transmission protocol selection means selects an optimal transmission protocol according to the platform and settings of the recipient and ensures delivery of the message. Furthermore, the user interface presentation means provides a preview screen for a final review of the document before sending, allowing the user to make final corrections as necessary. Furthermore, the message sending means sends the adjusted text in a format suitable for the messenger application or the email client, and the preview screen provides an interface for the user to make a final check of the document.
The above processes are realized by a specific processing unit built into the control unit of the smart device or the data processing device, depending on the device and settings used by the user. Moreover, these means are designed as modularized components, and can be easily replaced or expanded as system components. Furthermore, the correspondence between each means is set flexibly, and various changes can be made to accommodate system upgrades and customization. Moreover, this series of processes is modularized so that it can flexibly respond to devices and environments, and is designed to facilitate system upgrades and customization.
This embodiment can be further expanded to add a function for deeper understanding of the user's emotions and improving the quality of communication. For example, by incorporating a facial expression recognition function during video chat into the emotion engine, emotions can be analyzed from visual information as well, resulting in more accurate emotion judgment. To improve the accuracy of emotion recognition, the tone and pitch of the user's voice can also be analyzed and reflected in the text. Furthermore, by analyzing the user's past communication history and response patterns, the system can learn the individual's emotional expression style and adjust the text accordingly in a more personalized manner.
The text processing means has a function to suggest phrases inspired by literary works and poetry to enhance the diversity and creativity of expressions, allowing users to express their emotions more richly. It can also select appropriate expressions according to the environment and situation in which the communication takes place, taking into account the social context and cultural background.
The messaging tool will add a function to predict the emotional impact of the text sent on the recipient, allowing users to communicate more responsibly. In addition, AI can predict the recipient's reaction and use that information to suggest the next communication strategy the user should take.
Overall, these features will help users to reflect emotions accurately and sensitively in their communications, contributing to building deeper human relationships. These advanced methods can be effectively used by individuals as well as corporate customer support and CRM systems to deepen customer relationships. Furthermore, these systems are expected to be applied in fields where human emotions play an important role, such as user education and counseling.
The system leverages an emotion engine to provide document expression adjustments based on the user's emotions. To extend this functionality, sensors are integrated to capture the user's biometric information, allowing for a more accurate reading of emotions from physiological responses such as heart rate and skin conductivity. Data from the sensors is analyzed in real time, allowing the tone of the document to be adjusted on the fly.
In addition, it is equipped with a personalized learning function that continuously analyzes the user's daily communication to understand the user's unique expression style and preferences, allowing the system to suggest more natural and personalized documents that reflect the user's personality.
It will also enhance multilingual support, allowing it to capture emotional nuances in different languages, enhancing international communication and understanding among multilingual users.
To protect user privacy, the company will also add features to anonymize and encrypt emotion data and strengthen security, allowing users to use the system with peace of mind.
Aiming for applications in the fields of education and counseling, we will develop a function that uses the results of emotion recognition to support communication skills training. The training program will include practicing emotional expression and learning appropriate communication techniques.
Finally, to improve the usability of the system, we make the user interface rich and intuitive, supporting a variety of gestures and voice commands, and allowing users to easily intervene in the document adjustment process, allowing users to fine-tune the expression at their own will and achieve more personal communication.
(Example 3)
The embodiment of the present invention is a system that includes an emotion engine that recognizes the user's emotions during a call and an adjustment means that adjusts the response and dialogue content to match the user's emotions. Specifically, the user's tone of voice and choice of words are input to the emotion engine during a call to estimate the user's emotions. The generative AI's responses and dialogue content are adjusted based on the estimated emotions. For example, if the user shows sad emotions, words of empathy are used to provide encouragement and support.
The adjustment means detects the tone of the user's voice using the microphone 38B of the smart device 14, analyzes the tone using the control unit 46A, and estimates the emotion using the specific processing unit 290 of the data processing device 12. In addition, a part or all of the adjustment means may be realized by, for example, the specific processing unit 290 of the data processing device 12. The correspondence between each means and the device or control unit is not limited to the above example, and various changes are possible.
The voice data collection means includes a microphone with high sensitivity and noise canceling function. The microphone is equipped with a digital signal processing algorithm to remove ambient noise and clearly record the user's voice. The voice data collection means also employs a highly sensitive microphone that captures subtle acoustic characteristics from the user's speech and combines this with digital signal processing technology to effectively remove ambient noise. The microphone accurately captures the characteristics of the user's voice and provides basic data for detecting changes in emotions.
The voice feature extraction means extracts features that may reflect human emotions from the voice signal. This extraction means includes a spectrogram analysis function and a pitch tracking function for performing acoustic feature analysis, and an emotion identification function using an acoustic model. The voice feature extraction means also uses acoustic analysis tools such as a spectrum analysis function and a pitch analysis function to extract features that suggest emotions from the voice signal, and supplies these data to the emotion estimation model.
The emotion analysis means includes an emotion estimation model based on machine learning, which estimates the user's emotional state using the features extracted by the voice feature extraction means as input. The emotion estimation model is composed of classifiers such as neural networks, support vector machines, and decision trees trained based on training data, and classifies the user's emotions into categories such as positive, negative, and neutral. The emotion estimation model improves its accuracy through continuous learning, and evolves to be able to respond to subtle changes in the user's perception of emotions. In addition, the emotion analysis means has an emotion estimation model that utilizes machine learning technology, and analyzes the user's emotions based on the extracted voice features. This model combines various machine learning algorithms to identify emotion categories from the user's utterance, and estimates the user's emotional state based on this. The estimated emotional state is important information for adjusting the system's response to the user's utterance and behavior.
The dialogue adjustment means includes a response generation function using a generative AI model, which dynamically adjusts the content of the dialogue according to the emotion estimated by the emotion analysis means. The response generation function utilizes natural language generation technology to generate a response sentence using words and tones appropriate for the user's emotion. For example, if the user is expressing sad emotion, a response including empathetic expressions and comforting words is generated. This response generation function has a context-aware processing function for taking into account the context of the conversation and providing content that matches the user's emotion and the purpose of the communication. In addition, the generated response is constructed to be natural to the user and to meet the emotional needs. The response generation function is realized by a machine learning model trained on a large-scale conversation dataset and can adapt to the user's language and speaking style. In addition, the dialogue adjustment means includes a response generation function using a generative AI model, which generates dialogue content appropriately corresponding to the estimated emotion. This function selects words and a dialogue tone that correspond to the user's emotion and provides a response that matches the user's current emotion and the context of the conversation. The response generation function is designed to perform context-aware processing and provide the information and support required by the user in an appropriate form. This feature is capable of generating responses that adapt to a user's language and emotions based on models trained from large amounts of conversational data.
Non-sensor-based data collection methods include text information entered directly by users into the system and metadata obtained from call records. These are used as supplementary information when analyzing user behavioral patterns and preferences, and contribute to improving the accuracy of the emotion engine and optimizing response generation functions. User-entered text information is also used as training data for emotion estimation models as part of the emotion analysis method.
The system has the ability to analyze the user's emotions in real time during a call and automatically adjust the dialogue content, so it can integrate facial expression recognition technology in addition to voice data to capture even more subtle changes in emotions. It uses a webcam or a camera on a smart device to analyze the user's facial expressions and improve the accuracy of emotion analysis. This added facial expression recognition function will more accurately recognize the user's emotions and make it possible to adjust the dialogue according to even more subtle changes in emotions.
It is also conceivable to incorporate wearable devices to capture the user's physiological signals. Measuring physiological responses such as heart rate and electrodermal activity can provide deeper insight into emotions that cannot be captured by tone of voice or facial expressions alone. Integrating this data can further improve the accuracy of emotion analysis and generate more appropriate dialogue responses.
It would also be effective to add customization features to the dialogue adjustment method that take into account the user's cultural background and personal values. A user profile can be constructed and the tone and content of the dialogue can be further personalized based on that information. This makes it possible to provide detailed support tailored to each individual user.
Furthermore, in order to promote the evolution of the emotion estimation model, we will build a mechanism to collect emotion data through crowdsourcing and incorporate feedback from a variety of users to continuously update the model, resulting in a flexible system that can handle a variety of emotional expressions and languages.
Applications in the fields of education and mental health care can also be considered. For example, in the field of education, it could be used to present educational materials adapted to students' emotions or in counseling sessions. In mental health care, it could recognize the user's emotions and provide dialogue support to reduce stress and anxiety. This would enable a more effective approach to the problems the user is facing.
Regarding privacy protection of the system, we will strengthen security measures to safely store users' emotional data and minimize the risk of information leakage by using appropriate access control and encryption technology. This will allow users to use the system with peace of mind.
Ultimately, it is hoped that the personalized interaction experience provided by this system will be developed into services that improve the quality of users' lives.
The embodiment of the present invention is applicable not only to telephone calls but also to video conferencing and online education platforms. For example, a lecturer can grasp the emotions of students in real time and adjust the progress of the curriculum according to the emotions, thereby providing a more effective learning experience. Even in video conferencing, dialogue management that reflects the emotions of participants is performed, promoting a productive and positive meeting environment.
The system could also be enhanced to monitor health conditions based on emotional responses: for example, if a user's tone of voice or manner of speaking indicates negative emotions over a period of time, a mental health professional could be notified to intervene if necessary.
Furthermore, to further improve the emotion engine, a function will be incorporated that will analyze the emotional patterns of the user's daily life and provide advice on long-term emotion management and stress reduction based on that information. By analyzing the user's daily rhythm and activity patterns and predicting emotional ups and downs, the system will suggest content for relaxation and motivation at appropriate times.
This system is also expected to be applied in the field of customer support. For example, if call center operators could grasp the customer's emotions in real time and detect negative emotions such as dissatisfaction or anger, they could take immediate action to address the issue, contributing to improving customer satisfaction.
In the fields of games and entertainment, dynamic changes in content based on the user's emotions are expected to have the effect of increasing immersion and enjoyment. In-game characters will react to the player's emotions, changing the story development and dialogue, creating a more personalized experience.
In addition, we will integrate voice assistants and virtual reality (VR) to realize more natural dialogue based on the user's emotions. Voice assistants will understand the user's emotions and provide information and services tailored to individual needs. In the VR environment, the scenario and environment will change according to the user's emotions, providing an experience that matches the emotions in real time.
This system also has the potential to revolutionize the design of user interfaces (UI) and user experiences (UX). Using emotion recognition technology, it can provide UI and UX optimized for the user's emotions, increasing user satisfaction. For example, websites and applications can grasp the user's emotions in real time and adjust the way content is presented and the form of interaction.
Ultimately, it is hoped that the emotion regulation function provided by this system will become widely used throughout society as a tool to improve the quality of human relationships and increase the effectiveness of communication.

（形態例１）
ステップ１：未登録または非通知の電話番号からの着信があった場合、生成系AIが応答する。
ステップ２：通話終了後、用件をチャットGPTで文書化する。
ステップ３：文書化された用件を感情エンジンに入力し、ユーザの感情を分析する。
ステップ４：分析結果に基づいて、文書の表現を調整する。
ステップ５：調整された文書をメッセンジャーアプリやメールで送信する。
（形態例２）
ステップ１：通話終了後、用件をチャットGPTで文書化する。
ステップ２：文書化された用件を感情エンジンに入力し、ユーザの感情を分析する。
ステップ３：分析結果に基づいて、文書の表現を調整する。
ステップ４：調整された文書を選択し、送信先を指定する（例：家族）。
ステップ５：指定された送信先に文書をメッセンジャーアプリやメールで送信する。
（形態例３）
ステップ１：通話中にユーザの声のトーンや言葉の選択などを感情エンジンに入力し、ユーザの感情を分析する。
ステップ２：分析結果に基づいて、生成系AIの応答や対話の内容を調整する。
ステップ３：通話終了後、用件をチャットGPTで文書化する。文書化された用件は、感情エンジンに入力するために使用される。 (Example 1)
Step 1: When a call comes in from an unregistered or withheld phone number, the generative AI answers.
Step 2: After the call, document the matters discussed in Chat GPT.
Step 3: Input the documented requirements into the emotion engine to analyze the user's emotions.
Step 4: Based on the analysis results, the representation of the document is adjusted.
Step 5: Send the adjusted document via messenger app or email.
(Example 2)
Step 1: After the call, document the matters you wish to discuss in Chat GPT.
Step 2: Input the documented requirements into the emotion engine to analyze the user's emotions.
Step 3: Adjust the representation of the document based on the analysis results.
Step 4: Select the adjusted document and specify the recipient (e.g., family).
Step 5: Send the document to the specified recipient via messenger app or email.
(Example 3)
Step 1: During a call, the user's tone of voice, choice of words, etc. are input into the emotion engine to analyze the user's emotions.
Step 2: Based on the analysis results, adjust the generative AI's responses and dialogue content.
Step 3: After the call, document the matter in the chat GPT. The documented matter is used to input the emotion engine.

特定処理部２９０は、特定処理の結果をスマートデバイス１４に送信する。スマートデバイス１４では、制御部４６Ａが、出力装置４０に対して特定処理の結果を出力させる。マイクロフォン３８Ｂは、特定処理の結果に対するユーザ入力を示す音声を取得する。制御部４６Ａは、マイクロフォン３８Ｂによって取得されたユーザ入力を示す音声データをデータ処理装置１２に送信する。データ処理装置１２では、特定処理部２９０が音声データを取得する。 The specific processing unit 290 transmits the result of the specific processing to the smart device 14. In the smart device 14, the control unit 46A causes the output device 40 to output the result of the specific processing. The microphone 38B acquires audio indicating a user input for the result of the specific processing. The control unit 46A transmits audio data indicating the user input acquired by the microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the audio data.

データ生成モデル５８は、いわゆる生成ＡＩ（Artificial Intelligence）である。データ生成モデル５８の一例としては、ＣｈａｔＧＰＴ（インターネット検索＜URL: https://openai.com/blog/chatgpt＞）等の生成ＡＩが挙げられる。データ生成モデル５８は、ニューラルネットワークに対して深層学習を行わせることによって得られる。データ生成モデル５８には、指示を含むプロンプトが入力され、かつ、音声を示す音声データ、テキストを示すテキストデータ、及び画像を示す画像データ等の推論用データが入力される。データ生成モデル５８は、入力された推論用データをプロンプトにより示される指示に従って推論し、推論結果を音声データ及びテキストデータ等のデータ形式で出力する。ここで、推論とは、例えば、分析、分類、予測、及び／又は要約等を指す。特定処理部２９０は、データ生成モデル５８を用いながら、上述した特定処理を行う。 The data generation model 58 is a so-called generative AI (Artificial Intelligence). An example of the data generation model 58 is generative AI such as ChatGPT (Internet search <URL: https://openai.com/blog/chatgpt>). The data generation model 58 is obtained by performing deep learning on a neural network. A prompt including an instruction is input to the data generation model 58, and inference data such as voice data indicating a voice, text data indicating a text, and image data indicating an image is input. The data generation model 58 infers the input inference data according to the instruction indicated by the prompt, and outputs the inference result in a data format such as voice data and text data. Here, inference refers to, for example, analysis, classification, prediction, and/or summarization. The identification processing unit 290 performs the above-mentioned identification processing while using the data generation model 58.

上記実施形態では、データ処理装置１２によって特定処理が行われる形態例を挙げたが、本開示の技術はこれに限定されず、スマートデバイス１４によって特定処理が行われるようにしてもよい。 In the above embodiment, an example was given in which the specific processing is performed by the data processing device 12, but the technology disclosed herein is not limited to this, and the specific processing may also be performed by the smart device 14.

［第２実施形態］ [Second embodiment]

図３には、第２実施形態に係るデータ処理システム２１０の構成の一例が示されている。 Figure 3 shows an example of the configuration of a data processing system 210 according to the second embodiment.

図３に示すように、データ処理システム２１０は、データ処理装置１２及びスマート眼鏡２１４を備えている。データ処理装置１２の一例としては、サーバが挙げられる。 As shown in FIG. 3, the data processing system 210 includes a data processing device 12 and smart glasses 214. An example of the data processing device 12 is a server.

スマート眼鏡２１４は、コンピュータ３６、マイクロフォン２３８、スピーカ２４０、カメラ４２、及び通信Ｉ／Ｆ４４を備えている。コンピュータ３６は、プロセッサ４６、ＲＡＭ４８、及びストレージ５０を備えている。プロセッサ４６、ＲＡＭ４８、及びストレージ５０は、バス５２に接続されている。また、マイクロフォン２３８、スピーカ２４０、及びカメラ４２も、バス５２に接続されている。 The smart glasses 214 include a computer 36, a microphone 238, a speaker 240, a camera 42, and a communication I/F 44. The computer 36 includes a processor 46, a RAM 48, and a storage 50. The processor 46, the RAM 48, and the storage 50 are connected to a bus 52. The microphone 238, the speaker 240, and the camera 42 are also connected to the bus 52.

マイクロフォン２３８は、ユーザが発する音声を受け付けることで、ユーザから指示等を受け付ける。マイクロフォン２３８は、ユーザが発する音声を捕捉し、捕捉した音声を音声データに変換してプロセッサ４６に出力する。スピーカ２４０は、プロセッサ４６からの指示に従って音声を出力する。 The microphone 238 receives instructions and the like from the user by receiving voice uttered by the user. The microphone 238 captures the voice uttered by the user, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs the voice according to instructions from the processor 46.

カメラ４２は、レンズ、絞り、及びシャッタ等の光学系と、ＣＭＯＳ（Complementary Metal-Oxide-Semiconductor）イメージセンサ又はＣＣＤ（Charge Coupled Device）イメージセンサ等の撮像素子とが搭載された小型デジタルカメラであり、ユーザの周囲（例えば、一般的な健常者の視界の広さに相当する画角で規定された撮像範囲）を撮像する。 Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an imaging element such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures the user's surroundings (e.g., an imaging range defined by an angle of view equivalent to the field of vision of a typical able-bodied person).

通信Ｉ／Ｆ４４は、ネットワーク５４に接続されている。通信Ｉ／Ｆ４４及び２６は、ネットワーク５４を介してプロセッサ４６とプロセッサ２８との間の各種情報の授受を司る。通信Ｉ／Ｆ４４及び２６を用いたプロセッサ４６とプロセッサ２８との間の各種情報の授受はセキュアな状態で行われる。 The communication I/F 44 is connected to the network 54. The communication I/Fs 44 and 26 are responsible for the exchange of various information between the processor 46 and the processor 28 via the network 54. The exchange of various information between the processor 46 and the processor 28 using the communication I/Fs 44 and 26 is performed in a secure state.

図４には、データ処理装置１２及びスマート眼鏡２１４の要部機能の一例が示されている。図４に示すように、データ処理装置１２では、プロセッサ２８によって特定処理が行われる。ストレージ３２には、特定処理プログラム５６が格納されている。 Figure 4 shows an example of the main functions of the data processing device 12 and the smart glasses 214. As shown in Figure 4, in the data processing device 12, a specific process is performed by the processor 28. A specific process program 56 is stored in the storage 32.

特定処理プログラム５６は、本開示の技術に係る「プログラム」の一例である。プロセッサ２８は、ストレージ３２から特定処理プログラム５６を読み出し、読み出した特定処理プログラム５６をＲＡＭ３０上で実行する。特定処理は、プロセッサ２８がＲＡＭ３０上で実行する特定処理プログラム５６に従って、特定処理部２９０として動作することによって実現される。 The specific processing program 56 is an example of a "program" according to the technology of the present disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as the specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

スマート眼鏡２１４では、プロセッサ４６によって受付出力処理が行われる。ストレージ５０には、受付出力プログラム６０が格納されている。プロセッサ４６は、ストレージ５０から受付出力プログラム６０を読み出し、読み出した受付出力プログラム６０をＲＡＭ４８上で実行する。受付出力処理は、プロセッサ４６がＲＡＭ４８上で実行する受付出力プログラム６０に従って、制御部４６Ａとして動作することによって実現される。 In the smart glasses 214, the reception output process is performed by the processor 46. A reception output program 60 is stored in the storage 50. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output process is realized by the processor 46 operating as the control unit 46A in accordance with the reception output program 60 executed on the RAM 48.

（形態例１）
本発明を実施するための形態は、未登録または非通知の電話番号からの着信時に、生成系AIが応答する応答手段と、通話終了後に用件をチャットGPTで文書化し、メッセンジャーアプリやメールで送信する送信手段とを具備するシステムである。さらに、家族や知人と名乗る場合には、専用の質問や合言葉を使用して確認する確認手段と、配送業者は業者用の合言葉を使用して本人につながる接続手段と、通話中には電話番号をチェックし、悪用歴のある番号の場合は警察に連携する連携手段とを具備する。
応答手段は、データ処理装置１２の特定処理部２９０によって実現され、未登録または非通知の電話番号からの着信に対して、生成系AIを用いて応答する。送信手段は、通話終了後に用件をチャットGPTで文書化し、メッセンジャーアプリやメールで送信する機能として、データ処理装置１２の特定処理部２９０によって実現される。確認手段は、専用の質問や合言葉を使用して家族や知人を確認する機能として、スマート眼鏡２１４の制御部４６Ａによって実現されることができる。接続手段は、配送業者が業者用の合言葉を使用して本人につながる機能として、スマート眼鏡２１４の制御部４６Ａによって実現されることができる。連携手段は、通話中に電話番号をチェックし、悪用歴のある番号の場合は警察に連携する機能として、データ処理装置１２の特定処理部２９０によって実現される。各手段と装置や制御部との対応関係は、上述した例に限定されず、種々の変更が可能である。
応答手段は、音声認識技術を活用した発話理解機能が含まれ、未登録または非通知の番号からの着信に対して、事前に訓練されたAIが生成した自然言語による応答を行う。このAIは、大量の通話データを学習しており、さまざまなシナリオに対応できる回答を生成する能力を有する。着信があると、AIはリアルタイムで着信内容を解析し、適切な応答を選択して会話を進める。応答内容は、通話の性質に合わせてカスタマイズされるため、営業電話からの質問には丁寧に対応し、不審な着信に対しては慎重に応答する。また、応答手段は、未登録または非通知の電話番号からの着信に対応するため、音声合成技術と発話内容理解技術を駆使し、事前に訓練を受けたAIが適切な対応を行うこともできる。このAIは複数の応答シナリオを学習し、通話の意図を把握して、状況に応じた自然な対話を提供することができる。着信内容に基づいてAIが瞬時に応答を選択し、通話の趣旨に合わせてカスタマイズされた対応を行うため、営業電話の問い合わせには適切に対処し、怪しい着信には慎重に応じることができる。
送信手段には、通話内容をテキスト化する音声認識技術と、そのテキストを基に通話の要約を生成する自然言語生成技術が含まれる。通話が終了すると、AIは通話内容をテキストデータに変換し、その内容を要約して文書化する。生成された文書は、メッセンジャーアプリやメールなどのコミュニケーション手段を介して、ユーザーに送信される。このプロセスにより、ユーザーは通話の詳細を後からでも確認でき、重要な情報を見逃すことがない。また、送信手段では、終了した通話の内容を音声認識技術によりテキストに変換し、その後、自然言語処理技術を用いて要約を行い、生成された文書をメッセンジャーアプリやメールでユーザーに送付することもできる。ユーザーは送付された文書を介して通話の詳細をいつでも確認可能で、重要な情報が記録され、見落とされることがなくなる。
確認手段は、家族や知人からの着信と名乗る場合に、特定の質問や合言葉を使用して本人確認を行う機能を有する。この手段は、ユーザーが設定した質問や合言葉をAIが使用し、着信者が正しい応答を行うことで本人であることを確認する。本人確認が成功すると、AIは通話を継続し、必要に応じてユーザーに転送する。このプロセスにより、不正アクセスや詐欺を防ぎながら、本人であれば円滑な通信を確保する。また、確認手段は、家族や知人を名乗る着信者に対して、ユーザーが設定した質問や合言葉をAIが提示し、正確な回答が得られた場合にのみ、通話を継続またはユーザーに転送することもできる。この機能により、不正なアクセスや詐欺を未然に防ぎつつ、本人であることが確認された場合にはスムーズなコミュニケーションを実現する。
接続手段は、配送業者などの特定の職種の者が業者用の合言葉を使用することで本人と通話ができる機能を有する。この手段では、AIが業者用の合言葉を受け取り、正しい合言葉であることを確認した後、通話を本人に繋ぐ。業者用の合言葉は、通話のセキュリティを確保するために重要であり、誤った合言葉が入力された場合、通話は繋がらない。また、接続手段は、配送業者などの職種固有の合言葉をAIが受け取り、その正当性を確認した上で、通話をユーザーにつなぐ機能を担うこともできる。この合言葉によって、通話のセキュリティが確保され、誤った合言葉の場合には通話が切断される仕組みとなっている。
連携手段は、通話中に電話番号をリアルタイムでチェックし、その番号に悪用歴がある場合は自動的に警察や関連機関に連携する機能を有する。この手段は、データベースに蓄積された不審な番号のリストを参照し、着信番号がそのリストに含まれているかを確認する。不審な番号からの通話であると判断された場合、システムは即座に警察への報告プロトコルを開始し、状況に応じて適切な対応を取る。また、連携手段は、通話中の番号をリアルタイムで監視し、過去に悪用された履歴がある番号であれば自動的に警察などの関連機関と連携する機能を果たすこともできる。不審な番号の検出時には、警察への通報プロトコルが起動し、迅速な対応が取られる。
これらの手段は、ユーザーのセキュリティと利便性を高めるために、複数のAI技術と連携機能を組み合わせて実装される。データ処理装置とスマートデバイスの制御部が協力し、これらの手段を柔軟に実現するためのプラットフォームが提供される。各手段は、独立して機能するだけでなく、連動してより高度なサービスを提供するために設計される。ユーザーは、これらの手段を通じて、未登録または非通知の電話番号からの着信に対する対応を自動化し、日常生活の中で発生する潜在的なリスクから自己を守ることができる。また、これらの手段は、AI技術を中心に構築され、ユーザーのセキュリティを向上させると同時に、日常のコミュニケーションを効率化する。センサーを使用しないデータ収集の代替手段として、ユーザーが手動で情報を入力し、システムがその情報を利用するプロセスが可能である。ユーザーはこれらの手段によって、未登録または非通知の着信に対する対処を自動化し、不測の事態からの保護を強化することができる。
本形態例のシステムには、さらなる機能を追加し、その利便性を高めることができる。例えば、応答手段には、会話中にユーザーの感情を解析する機能を追加し、通話相手の声のトーンや話し方から感情を推定し、それに応じた応答をAIが行う。これにより、通話相手が怒りや不安を感じている場合は、より慎重かつ共感的な応答が可能となり、通話の品質を向上させる。
送信手段には、文書化した通話内容をユーザーのカレンダーやリマインダーに自動的に統合する機能を追加する。これにより、通話で得たアポイントメントやタスクを忘れずに管理できるようになり、生産性の向上を図ることができる。
確認手段は、ユーザーの生体情報を利用して本人確認を行う機能を備える。例えば、スマートウォッチやフィットネストラッカーから得た生体情報を用いて、通話相手が実際に家族や知人であることを確認する。これにより、より高度なセキュリティを実現しつつ、利用者の手間を減らすことができる。
接続手段には、配送業者がQRコード（登録商標）またはNFCタグをスキャンすることで認証を行い、通話を本人に直接繋ぐ機能を追加する。これにより、合言葉のやり取りを省略し、より迅速かつ安全に配送業者との連携を図ることが可能となる。
連携手段では、通話中の音声データから不審なキーワードを検出し、その内容を自動で分析する機能を追加する。キーワードに基づいて不審な通話と判断された場合、警察への通報を行う前に、まずはユーザーに警告することで、より正確な判断をサポートする。
これらの追加機能により、本形態例のシステムは、通話の自動応答だけでなく、日々の生活の中でのセキュリティやスケジュール管理をより効率的にサポートする。また、各種のAI技術とデータベースの連携によって、ユーザーの生活に更なる安心と便利さを提供する。さらに、これらの技術をユーザーのスマートデバイスや家庭内のIoT機器と統合することで、よりパーソナライズされた体験を実現することが可能となる。
本形態例のシステムには、さらなる機能拡張が可能であり、利便性およびセキュリティを高めるために応答手段には、発信者の声紋を分析し、登録済みの家族や知人の声と照合する機能を追加することができる。これにより、発信者が本人であるかをより正確に判断し、特定の質問や合言葉を使用しなくても安全に通話を継続することが可能になる。
送信手段では、音声認識と自然言語処理を活用して得られた通話内容を、ユーザーが選択した複数の言語に翻訳し、異なる言語を話すユーザー間のコミュニケーションを支援する機能を組み入れることができる。これにより、国際的なビジネスシーンや多言語を話す家庭内での利用が促進される。
確認手段には、ユーザーが特定のジェスチャーや動作をカメラに示すことで本人確認を行う機能を追加することができる。この機能により、通話中に手軽にかつ迅速に本人確認を行うことが可能となり、セキュリティが一層強化される。
接続手段に関しては、AIが発信者の位置情報と配送予定情報を照合し、本人確認を行う機能を追加することで、配送業者が実際に商品を配送している場所から通話していることを確認し、通話の真正性を保証することが可能になる。
連携手段では、不審な通話が検出された際に、警察だけでなく、ユーザーの指定した緊急連絡先にも自動通報する機能を追加することで、万が一の緊急時に迅速な対応が行えるようになる。
これらの機能追加は、既存のシステムの基本的な構造を維持しつつ、ユーザーのニーズに合わせて柔軟な対応が可能となる。また、スマートデバイスや家庭内のIoT機器と連携することで、ユーザーが日常的に使用する環境においてもシームレスな経験を提供する。これらの追加機能により、本形態例のシステムは通話の自動応答を超え、日常生活のあらゆる面でユーザーのセキュリティと利便性を向上させる。
（形態例２）
本発明を実施するための形態は、通話終了後に用件をチャットGPTで文書化し、メッセンジャーアプリやメールで送信する際に、家族宛に送信可能な送信手段を具備するシステムである。具体的には、通話終了後に生成された文書を選択し、送信先を家族と指定することで、家族宛に用件を送信することができる。
送信手段は、通話終了後に生成された文書を選択し、送信先を家族と指定することで、家族宛に用件を送信する機能として、データ処理装置１２の特定処理部２９０によって実現される。また、この送信手段は、スマート眼鏡２１４の制御部４６Ａによって実現されることも可能である。各手段と装置や制御部との対応関係は、上述した例に限定されず、種々の変更が可能である。
通話終了後の内容を文書化する手段は、音声認識技術を用いた会話内容解析手段を含む。この解析手段は、通話の音声データをテキストに変換し、通話の要点を抽出する。また、音声認識技術を用いた会話内容解析手段は、通話が終了した後に発生する音声データをテキストに変換し、そのテキストデータから通話の主要なポイントを抽出する機能を持つ。音声認識技術は、ディープラーニングに基づくモデルを活用し、さまざまな言語やアクセントに対応するためのトレーニングが行われる。また、この解析手段は、多様な言語やアクセントに対応するためにディープラーニングモデルを活用し、大量の音声データを基に学習を進める。会話内容解析手段は、形態素解析や構文解析を行い、会話の中で交わされた重要な情報や行動を要する内容を特定する。特に、キーワード抽出機能を用いて会話の中で頻繁に使われる単語やフレーズを特定し、それらを基に通話の要点を文書化する。また、会話の要点を明確にするために、形態素解析機能と構文解析機能を組み合わせ、会話中に交わされた重要な情報やアクションを要する内容を特定する。キーワード抽出機能は、会話の中で頻出する単語やフレーズを識別し、それらの情報を基に文書化する。
文書化された内容をメッセージとしてフォーマットする手段は、テキストエディタ機能を含む。この機能は、解析されたテキストを整理し、文書の構造を整える。メッセージフォーマット手段は、ユーザが容易に内容を確認し、必要に応じて編集や追加情報を加えることができるインタフェースを提供する。また、文書化された内容をメッセージ形式に整える手段には、テキストエディタ機能が含まれ、解析されたテキストを整理し、文書のレイアウトを整える。ユーザがメッセージの内容を確認し、編集や追加情報を加えることができるインターフェイスを提供する。
送信手段は、メッセージを宛先指定機能を用いて特定の家族に送信する。この宛先指定機能は、ユーザの連絡先リストと連携し、選択された家族メンバーの連絡先情報に基づいてメッセージを自動的に送信する。また、送信手段は、メッセージ送信の確認と送信履歴を管理する機能も備えており、ユーザは送信されたメッセージの状態を追跡し、必要に応じて再送信を行うことができる。メッセージの送信は、メッセンジャーアプリやメールアプリとの連携によって行われる。メッセンジャーアプリやメールアプリとの連携機能は、ユーザのアカウント情報を用いて認証を行い、安全にメッセージを送信する。また、送信手段は、セキュリティ対策としてメッセージの暗号化や送信時の認証プロセスを実施し、プライバシーの保護を確保する。また、送信手段は、ユーザが選択したメッセージ送信方法に応じて、メッセージを適切なフォーマットで送信する。例えば、メッセンジャーアプリではインスタントメッセージとして、メールアプリでは電子メールとして送信する。メッセージの送信手段は、ユーザが選択した家族メンバーに対してメッセージを送るための宛先指定機能を備えており、ユーザの連絡先リストと連携して、家族メンバーの連絡先に基づいたメッセージの自動送信を行う。送信手段には、メッセージの送信状況を確認し、送信履歴を管理する機能も含まれており、ユーザは送信したメッセージの状態を追跡し、必要に応じて再送信を行うことができる。メッセージの送信プロセスは、メッセンジャーアプリやメールアプリと連携し、ユーザのアカウント情報を基に認証を行い、安全にメッセージを送信する機能を持つ。送信手段は、メッセージの内容を暗号化し、送信時の認証を行うセキュリティ対策を施して、プライバシーを守る。また、メッセンジャーアプリではインスタントメッセージとして、メールアプリでは電子メールとして、ユーザが選択した送信方法に応じて適切なフォーマットでメッセージを送信する。
センサーを含まないデータ収集の例としては、ユーザが手動で通話内容をメモする場合が考えられる。この場合、ユーザは通話終了後に自分で通話の要点を記録し、そのテキストデータをメッセージとして家族に送信する。ユーザが手動で記録した通話内容は、テキストエディタ機能を用いて整理され、フォーマットされたメッセージとして送信手段によって家族宛に送信される。この手動記録は、音声認識技術を用いた自動文書化が適用できない状況や、ユーザが特定の情報を自らの言葉で伝えたい場合に適している。また、センサーを用いないデータ収集の方法として、ユーザが通話終了後に手動で通話内容をメモし、その情報をメッセージとして家族に送るシナリオも考えられる。この手動記録は、テキストエディタ機能を用いて整理され、フォーマットされたメッセージとして送信される。音声認識技術が適用できない状況や、ユーザが自らの言葉で特定の情報を伝えたい場合に利用される。
本発明の実施形態では、通話内容の文書化を超えた機能を考慮することができる。例えば、音声認識によって生成されたテキストに基づき、スケジュール管理システムと連携して、通話中に言及されたアポイントメントや予定を自動的にカレンダーに登録する機能を追加する。これにより、ユーザーは通話後に手動でスケジュールを管理する手間を省くことができる。さらに、家族間で共有されるカレンダーへの予定登録を提案し、家族全員が予定を共有しやすくする。
また、文書化されたテキストデータを基に、自動的にタスクリストを生成し、家族全員がアクセスできる共有プラットフォームに投稿する機能を設けることも可能である。このプラットフォームでは、各家族メンバーがタスクの進捗を更新したり、完了したタスクにチェックを入れることができ、家族全員で情報を共有し協力する環境を構築する。
さらに、文書化されたメッセージに対して、感情分析を行い、通話中の感情的なニュアンスをテキストに反映させる機能を追加することで、メッセージの受取人が発信者の意図をより正確に理解することを助ける。例えば、通話中に喜びや心配といった感情が表れた場合、その感情をテキストに特定の絵文字やフォーマットで表現し、コミュニケーションの豊かさを高める。
また、音声認識と解析を活用して、通話内容から自動的にFAQやよくある質問リストを生成し、家族が同様の問い合わせをする際に参照できる知識ベースを構築する機能も考えられる。この知識ベースは、家族内で共有され、新たな通話が発生するたびに更新されることで、家族間のコミュニケーションの効率を向上させる。
さらに、通話終了後に生成されるテキストは、メッセンジャーアプリやメールで送信するだけでなく、音声形式で再生する機能を付加することで、視覚障害のある家族メンバーや読み書きが苦手な子供でも情報を容易に受け取れるようにする。
最後に、通話終了後に文書化された内容を、家族メンバーのプライバシーを保護するために、文書内の機微な情報を識別し、自動的に匿名化や伏せ字処理を行う機能を組み込むことで、安心して情報を共有できる環境を提供する。これにより、個々のプライバシーを尊重しつつ、必要な情報のみを共有するバランスを保つことができる。
本発明の実施形態は、通話内容の自動文書化と送信に関するものであり、これに新たな機能を追加することが考えられる。例えば、通話内容を分析し、通話終了後に自動でアクションアイテムを生成し、対応が必要なタスクとしてユーザーのスマートデバイスにリマインダーをセットする機能が考えられる。このリマインダーは、通話で言及された期限や重要性に基づいて優先度を設定し、ユーザーが忘れずに行動に移せるようサポートする。
さらに、家族間でのコミュニケーションを強化するために、文書化されたメッセージ内の特定の単語やフレーズに基づいて、関連する画像やビデオ、リンクを自動的に添付する機能を追加することも有益である。これにより、テキストベースのメッセージだけでなく、視覚的な情報も共有でき、コミュニケーションがより豊かになる。
また、通話内容の文書化に際して、プライバシーに配慮し、特定の個人情報や機密情報を自動的に検出し、ブラー処理や伏せ字に変換する機能を実装することで、セキュリティを高めることができる。このプロセスは、自然言語処理技術とプライバシー保護のガイドラインに従って行われる。
通話内容のテキスト化では、ユーザーの多様なニーズに対応するため、複数の言語への翻訳機能を組み込むことも有効である。家族が異なる言語を話す多文化の環境では、通話内容を自動的に翻訳し、各メンバーが理解しやすい言語でメッセージを送信することが可能となる。
さらなる利便性を追求するために、メッセージの送信タイミングをユーザーがカスタマイズできるスケジュール機能を追加する。ユーザーは、即時送信だけでなく、特定の日時にメッセージを送信するよう設定できるため、家族が情報を受け取るタイミングを最適化できる。
最後に、メッセージの受け取り側で、受信したメッセージに対するアクションを簡単にとれるよう、返信や確認のためのクイックアクションボタンを設けることで、迅速なフィードバックと効率的なコミュニケーションを実現する。これにより、家族間での情報共有がさらにスムーズに行われるようになる。
（形態例３）
本発明を実施するための形態は、通話中に電話番号をチェックし、悪用歴のある番号の場合には警察に連携する際に、警察との連携手段を具備するシステムである。具体的には、通話中に着信番号をデータベースと照合し、悪用歴のある番号であることを検出した場合には、自動的に警察に通報する機能を備えている。
警察との連携手段は、通話中に着信番号をデータベースと照合し、悪用歴のある番号であることを検出した場合に自動的に警察に通報する機能として、データ処理装置１２の特定処理部２９０によって実現される。この連携手段は、スマート眼鏡２１４の制御部４６Ａによって実現されることも可能である。各手段と装置や制御部との対応関係は、上述した例に限定されず、種々の変更が可能である。
抽出手段は、通信網を介して発信される各通話の信号に含まれる発信者情報を抽出するために備わっており、デジタル信号処理技術を用いて通話データから発信者の電話番号を正確に取得することができる。また、抽出手段は、Caller ID情報を解析し、電話番号を特定するための信号解析機能を持っている。このシステムは、通話信号の中から発信者の電話番号を抽出するための信号抽出機能と、Caller ID情報の解読に特化した解析アルゴリズムを利用して、着信番号を正確に特定することもできる。
照合手段は、抽出された電話番号を不正利用が疑われる電話番号を収集し、カテゴリ別に整理したデータベースに照合する機能を有しており、データベース管理システムを通じてリアルタイムでの照合処理が可能であり、高速な検索アルゴリズムとインデックス技術により、通話が進行している間に迅速な照合が行われる。また、照合手段は、悪用歴のある番号をリスト化したデータベースを参照し、迅速な検索と照合を行うための高性能なデータベース検索機能とインデックス機能を備えている。
連携手段は、悪用歴のある番号が検出された場合に自動的に警察に通報する機能を持ち、通報システムとのインターフェイス機能が含まれ、通報する際の警察の受付システムとのプロトコルに基づいたデータ形式で通報情報を生成し、安全な通信チャネルを用いて警察の受付システムに送信する。通報情報の内容には、検出された悪用歴のある電話番号、通話の日時、通話の持続時間、発信者が利用している通信事業者などの情報が含まれ、個人情報の保護や通報の正確性を確保するために、暗号化技術や認証システムが用いられる。また、連携手段は、複数の通信プロトコルや通報システムとの互換性を持ち、システム間のデータ交換を円滑に行うためのアダプター機能を設け、通報のプロセスにおいて、警察の受付システムの要件に合わせて通報情報のフォーマットを調整し、適切な通報プロトコルを選択して通報を行う機能を持つ。通報プロセスが発動されると、システムは警察の受付システムに対して、通報情報を送信し、通報の受付確認を取得する。この確認は、通報が正しく行われたことをシステムが記録し、通報履歴として保存するための情報として利用される。通報履歴は、将来的な分析や改善のために用いられ、通報プロセスの効率化や精度向上に寄与する。この連携機能は、通報データ生成機能により、必要な情報を含む通報フォーマットを作成し、セキュアな通報送信機能を介して警察の受付システムへ情報を送信する。通報のセキュリティと信頼性を保証するため、データの暗号化機能とシステム認証機能が導入されている。また、システムの連携手段は、複数の通報プロトコルと互換性を持ち、異なる通報システムとのデータ交換を実現するアダプター機能を有しており、このアダプター機能は、通報プロトコル選択機能により、通報時のプロトコル要件に適した形式に自動的に調整し、通報情報の送信と受付確認を行う。通報履歴記録機能は通報の成功を記録し、システムのパフォーマンス分析や改善に使用される。
データ収集手段には、センサーを用いない例として、ユーザがアプリケーションやウェブインターフェース上で疑わしい通話に関する報告を行う機能があり、ユーザは、通話の経験や通話中に感じた不審な点をフォームに入力し、その情報がデータベースに登録される。この手動報告により収集されたデータは、自動的にデータベースに照合される電話番号のリストに追加される可能性があり、悪用歴のある番号の検出精度の向上に寄与する。また、センサーを使用しないデータ収集例として、ユーザが疑わしい通話について報告するための入力機能が提供される。ユーザはインタラクティブな報告フォームを通じて、疑わしい通話の内容をデータベースに登録し、これにより収集された情報は悪用歴のある番号の検出に活用される。この手動報告システムは、ユーザの経験と感覚に基づいて追加データを提供し、番号照合データベースの拡張に貢献する。
このシステムには、データベースの更新メカニズムを強化する機能を追加することができる。例えば、新たに悪用が確認された番号は、通報後も自動的にデータベースに追加される。さらに、疑わしい通話が報告された際には、その番号の信用情報を他のデータベースとも照合し、ユーザーからの報告に基づく情報と組み合わせることで、より正確な悪用歴の特定を実現する。データベースの整合性を保つために、定期的なクリーニングプロセスを実行し、誤った情報や古いデータを排除する仕組みも設けられる。また、通報システムとの連携を強化するため、警察が提供する犯罪データベースと直接連携し、照合プロセス中にリアルタイムで犯罪情報を取得し、照合結果の精度を向上させる機能も導入される。
通報の即時性を高めるために、通話が開始された瞬間に照合プロセスが開始され、悪用歴のある番号が検出された場合、通話者に警告音を出すか、自動的に通話を遮断するオプションも設けられる。さらに、通話を遮断した際には、通報者に代わって通話内容の録音を保存し、警察の調査に役立てることができる。警察が介入する際には、通報者の位置情報や通話履歴を含む詳細なレポートが自動生成され、犯罪捜査の迅速化を支援する。
ユーザインターフェースには、通報システムの透明性を高めるために、通報プロセスの進行状況をリアルタイムで確認できる機能が追加される。通報の結果や警察からのフィードバックをユーザが確認できるようにすることで、システムへの信頼性を向上させる。また、悪用歴のある番号に関する統計データやトレンド分析を提供し、ユーザが通話に対する警戒心を持つための情報提供も行われる。
さらに、悪用歴のある番号を特定するための機械学習技術を導入し、通話パターンや通話の頻度などの様々な指標を分析することで、悪用の可能性が高い新たな番号を予測する。これにより、データベースの予防的な更新が可能となり、未知の犯罪行為を防ぐための対策を強化する。また、ユーザが通報システムの効果について直接フィードバックを提供できる機能を設け、システムの改善に役立てる。フィードバックは匿名で行われることで、ユーザのプライバシーを保護しつつ、システムの改善に資する貴重な情報を収集する。
通話中の電話番号チェックをより効果的にするためには、ユーザーが直面する可能性のある様々な詐欺のパターンをAIが学習し、特定の単語やフレーズが通話中に検出された際にリアルタイムでフラグを立てる機能を実装する。これにより、単に電話番号がデータベース内の悪用歴と一致するだけでなく、通話の内容からも悪意を持った行動を推測し、検出することが可能になる。また、ユーザーが詐欺を疑う通話を簡単に報告できるショートカットやボタンをスマートフォンのインターフェースに設け、報告プロセスを簡略化する。これにより、データベースはより迅速に更新され、他のユーザーに対する保護が向上する。
警察との連携を強化するためには、通報された情報を基に警察が迅速に対応できるよう、通報システムに位置情報追跡機能を統合し、犯罪者の追跡と捕捉を支援する。また、通報システムに組み込まれる人工知能は、通報データから犯罪パターンを分析し、予防的な警戒活動を計画するための情報を警察に提供する。このような予測分析を活用することで、将来的な犯罪を未然に防ぐことに繋がる。
警察とのデータ共有を促進するために、警察が把握している詐欺事件やその他の犯罪に関する情報をリアルタイムで受け取り、データベースを更新する機能を設ける。これにより、通報システムは最新の犯罪情報に基づいて機能し、ユーザーを守るための対策が強化される。さらに、システムにはブロックリスト機能を追加し、ユーザーが自身で疑わしい番号を登録して通話を拒否できるようにする。これにより、ユーザー自身が直接リスクをコントロールすることが可能になる。
教育プログラムとして、ユーザーが詐欺の手口を認識し、予防するための情報を提供するオンライン講座やワークショップを開催する。これにより、ユーザーは自分自身を守るための知識を得ることができ、社会全体のセキュリティ意識が向上する。また、通報システムの利用によって防がれた詐欺事件の事例を共有し、ユーザーがシステムの実効性を理解しやすくする。
最後に、システムのアップデートを通じて、通話が詐欺である可能性が高いと判断された場合に、ユーザーに自動的に警告メッセージを送信し、詐欺に対する警戒を促す機能を実装する。これにより、ユーザーは即座に詐欺である可能性を認識し、適切な対応を取ることができるようになる。 (Example 1)
The embodiment of the present invention is a system that includes a response means in which a generative AI responds to an incoming call from an unregistered or withheld phone number, and a sending means that documents the matter in chat GPT after the call ends and sends it via a messenger app or email. In addition, the system includes a confirmation means that uses a special question or password to confirm if the caller claims to be a family member or acquaintance, a connection means that connects the delivery company to the person using a password for the delivery company, and a linking means that checks the phone number during the call and links to the police if the number has a history of abuse.
The response means is realized by the specific processing unit 290 of the data processing device 12, and responds to an incoming call from an unregistered or unnotified phone number using a generative AI. The transmission means is realized by the specific processing unit 290 of the data processing device 12 as a function of documenting the matter in chat GPT after the call is ended and sending it by a messenger app or email. The confirmation means can be realized by the control unit 46A of the smart glasses 214 as a function of confirming family members and acquaintances using dedicated questions or passwords. The connection means can be realized by the control unit 46A of the smart glasses 214 as a function of a delivery company connecting to the person using a password for the delivery company. The linking means is realized by the specific processing unit 290 of the data processing device 12 as a function of checking the phone number during a call and linking to the police if the number has a history of misuse. The correspondence between each means and the device or control unit is not limited to the above example, and various changes are possible.
The response means includes a speech understanding function that utilizes voice recognition technology, and responds to calls from unregistered or withheld numbers in natural language generated by a pre-trained AI. This AI has learned a large amount of call data and has the ability to generate responses that can respond to various scenarios. When a call is received, the AI analyzes the content of the call in real time and selects an appropriate response to proceed with the conversation. The response content is customized to the nature of the call, so questions from sales calls are answered politely and suspicious calls are answered carefully. In addition, the response means can also use speech synthesis technology and speech content understanding technology to respond to calls from unregistered or withheld phone numbers, and a pre-trained AI can respond appropriately. This AI can learn multiple response scenarios, understand the intention of the call, and provide a natural dialogue according to the situation. The AI instantly selects a response based on the content of the call and responds customized to the purpose of the call, so inquiries from sales calls can be handled appropriately and suspicious calls can be answered carefully.
The transmission means includes voice recognition technology that converts the contents of the call into text and natural language generation technology that generates a summary of the call based on that text. When the call ends, the AI converts the contents of the call into text data and summarizes and documents the contents. The generated document is sent to the user via a communication means such as a messenger app or email. This process allows the user to check the details of the call at a later date and ensures that important information is not overlooked. The transmission means can also convert the contents of the completed call into text using voice recognition technology, then summarize it using natural language processing technology, and send the generated document to the user via a messenger app or email. The user can check the details of the call at any time through the sent document, and important information is recorded and will not be overlooked.
The verification means has the function of verifying the identity of the caller by using specific questions or passwords when the caller claims to be a family member or acquaintance. In this method, the AI uses questions and passwords set by the user, and verifies the identity of the caller by providing the correct response. If identity verification is successful, the AI continues the call and transfers the call to the user if necessary. This process prevents unauthorized access and fraud while ensuring smooth communication if the caller is the real person. The verification means can also have the AI present questions and passwords set by the user to callers claiming to be family members or acquaintances, and continue the call or transfer the call to the user only if an accurate answer is given. This function prevents unauthorized access and fraud, while enabling smooth communication if the caller is confirmed to be the real person.
The connection means has a function that allows a person in a specific occupation, such as a delivery worker, to talk to the person by using a secret code for the occupation. With this method, AI receives the secret code for the occupation, verifies that it is the correct code, and then connects the call to the person. The secret code for the occupation is important for ensuring the security of the call, and if an incorrect code is entered, the call will not be connected. The connection means can also have a function where AI receives a secret code specific to the occupation, such as a delivery worker, verifies its validity, and then connects the call to the user. The security of the call is ensured by this secret code, and if the incorrect code is entered, the call will be disconnected.
The linking means has a function of checking the phone number in real time during a call, and automatically linking to the police or related organizations if the number has a history of misuse. This means refers to a list of suspicious numbers stored in a database, and checks whether the incoming number is included in the list. If it is determined that the call is from a suspicious number, the system immediately starts a reporting protocol to the police, and takes appropriate action depending on the situation. The linking means can also monitor the number being called in real time, and automatically link to related organizations such as the police if the number has a history of misuse in the past. When a suspicious number is detected, a reporting protocol to the police is activated, and a prompt response is taken.
These measures are implemented by combining multiple AI technologies and collaboration functions to enhance user security and convenience. The data processing device and the control unit of the smart device cooperate to provide a platform for flexibly realizing these measures. Each measure is designed not only to function independently but also to work together to provide more advanced services. Through these measures, users can automate responses to calls from unregistered or unnotified phone numbers and protect themselves from potential risks that occur in daily life. In addition, these measures are built around AI technology to improve user security while simultaneously streamlining daily communication. As an alternative to sensor-free data collection, a process in which users manually enter information and the system uses that information is possible. Through these measures, users can automate responses to unregistered or unnotified calls and strengthen protection from unforeseen events.
Further functions can be added to the system of this embodiment to enhance its convenience. For example, the response means can be added with a function to analyze the user's emotions during a conversation, and the AI can infer the emotions from the tone of the other party's voice and manner of speaking and respond accordingly. This allows for a more careful and empathetic response when the other party is feeling angry or anxious, improving the quality of the call.
The sending method will add a feature that automatically integrates documented calls into users' calendars and reminders, helping them remember appointments and tasks generated during calls and improving productivity.
The verification method has a function to verify the identity of the user using biometric information. For example, biometric information obtained from a smartwatch or fitness tracker can be used to verify that the person on the other end of the line is actually a family member or acquaintance. This reduces the hassle for users while achieving a higher level of security.
The connection method will add a function that allows the delivery company to scan a QR code (registered trademark) or NFC tag to authenticate the user and connect the call directly to the user, eliminating the need for a password and making it possible to communicate with the delivery company more quickly and safely.
The collaboration will add a function to detect suspicious keywords from voice data during phone calls and automatically analyze the content. If a call is deemed suspicious based on the keywords, the system will first warn the user before reporting it to the police, helping them make more accurate decisions.
With these additional functions, the system of this embodiment not only automatically answers calls, but also more efficiently supports security and schedule management in daily life. In addition, by linking various AI technologies with databases, it provides greater security and convenience to the user's life. Furthermore, by integrating these technologies with the user's smart devices and IoT devices in the home, it becomes possible to realize a more personalized experience.
The system of this embodiment can be further expanded, and in order to increase convenience and security, the response means can be added with a function to analyze the caller's voiceprint and compare it with the voices of registered family members and acquaintances. This makes it possible to more accurately determine whether the caller is the real person and to continue the call safely without using specific questions or passwords.
The transmission means can incorporate a function that uses voice recognition and natural language processing to translate the contents of the call into multiple languages selected by the user, facilitating communication between users who speak different languages, facilitating use in international business situations and within multilingual households.
The verification method can include a feature that allows users to verify their identity by making specific gestures or movements on the camera, making it possible to easily and quickly verify the identity of a user during a call, further enhancing security.
Regarding the means of connection, AI will be able to add a function to verify the identity of the caller by comparing their location information with delivery schedule information, making it possible to confirm that the delivery company is calling from a location where the goods are actually being delivered, thereby guaranteeing the authenticity of the call.
The collaboration method will add a function that automatically notifies not only the police but also the user's designated emergency contact when a suspicious call is detected, enabling a rapid response in the event of an emergency.
These additional functions allow for flexible response to user needs while maintaining the basic structure of the existing system. In addition, by linking with smart devices and IoT devices in the home, the system provides a seamless experience in the environment in which the user uses the system on a daily basis. With these additional functions, the system of this embodiment goes beyond automatic call response and improves the security and convenience of the user in all aspects of daily life.
(Example 2)
The embodiment of the present invention is a system that has a sending means that can send the matter to family members when documenting the matter in chat GPT after the call ends and sending it by messenger app or email. Specifically, the matter can be sent to family members by selecting the document created after the call ends and specifying "family members" as the destination.
The sending means is realized by the specific processing unit 290 of the data processing device 12 as a function of selecting the document generated after the call is ended and sending the message to the family by specifying the family as the destination. This sending means can also be realized by the control unit 46A of the smart glasses 214. The correspondence between each means and the device or control unit is not limited to the above example, and various changes are possible.
The means for documenting the contents after the end of the call includes a conversation content analysis means using a voice recognition technology. This analysis means converts the voice data of the call into text and extracts the main points of the call. In addition, the conversation content analysis means using the voice recognition technology has a function of converting the voice data generated after the end of the call into text and extracting the main points of the call from the text data. The voice recognition technology utilizes a model based on deep learning and is trained to support various languages and accents. In addition, in order to support various languages and accents, this analysis means utilizes a deep learning model and proceeds with learning based on a large amount of voice data. The conversation content analysis means performs morphological analysis and syntactic analysis to identify important information exchanged in the conversation and content requiring action. In particular, a keyword extraction function is used to identify words and phrases frequently used in the conversation, and the main points of the call are documented based on these. In addition, in order to clarify the main points of the conversation, a morphological analysis function and a syntactic analysis function are combined to identify important information exchanged during the conversation and content requiring action. The keyword extraction function identifies words and phrases that appear frequently in the conversation, and documents based on this information.
The means for formatting the documented content into a message includes a text editor function, which organizes the parsed text and arranges the document structure. The message formatting means provides an interface that allows a user to easily check the content and add edits or additional information as necessary. The means for formatting the documented content into a message also includes a text editor function, which organizes the parsed text and arranges the document layout. The means for formatting the message provides an interface that allows a user to check the content of the message and add edits or additional information.
The sending means sends a message to a specific family member using a destination designation function. This destination designation function works in conjunction with the user's contact list to automatically send a message based on the contact information of the selected family member. The sending means also has a function of confirming message transmission and managing a transmission history, so that the user can track the status of the sent message and resend it as necessary. The message is sent in conjunction with a messenger app or an email app. The linking function with the messenger app or the email app performs authentication using the user's account information and transmits the message safely. The sending means also encrypts the message as a security measure and performs an authentication process at the time of transmission to ensure privacy protection. The sending means also transmits the message in an appropriate format depending on the message transmission method selected by the user. For example, the message is sent as an instant message in the messenger app, and as an email in the email app. The message sending means has a destination designation function for sending a message to a family member selected by the user, and works in conjunction with the user's contact list to automatically send a message based on the contact information of the family member. The sending means also includes a function for checking the sending status of a message and managing the sending history, so that the user can track the status of the sent message and resend it if necessary. The message sending process has a function for linking with a messenger application or an email application, authenticating the user based on the account information, and sending the message securely. The sending means protects privacy by encrypting the contents of the message and implementing security measures such as authentication at the time of sending. In addition, the message is sent in an appropriate format according to the sending method selected by the user, such as an instant message in the messenger application or an email in the email application.
An example of data collection that does not include sensors is when a user manually takes notes on a phone call. In this case, the user records the main points of the call after the call ends, and sends the text data to the family as a message. The call contents manually recorded by the user are organized using a text editor function and sent to the family as a formatted message by the sending means. This manual recording is suitable for situations where automatic documentation using voice recognition technology cannot be applied, or when the user wants to convey specific information in his or her own words. Another possible method of data collection that does not include sensors is a scenario in which a user manually takes notes on a phone call after the call ends, and sends the information to the family as a message. This manual recording is organized using a text editor function and sent as a formatted message. This is used in situations where voice recognition technology cannot be applied, or when the user wants to convey specific information in his or her own words.
In an embodiment of the present invention, functions beyond documenting the contents of a call can be considered. For example, a function can be added that automatically registers appointments and events mentioned during a call in a calendar in cooperation with a schedule management system based on the text generated by speech recognition. This can save the user the trouble of manually managing the schedule after the call. In addition, it can suggest adding events to a calendar shared among family members, making it easier for all family members to share the schedule.
It is also possible to automatically generate task lists based on documented text data and post them to a shared platform that can be accessed by all family members, where each family member can update the progress of tasks and check off completed tasks, creating an environment for the whole family to share information and cooperate.
In addition, the company will add a feature that performs sentiment analysis on written messages and reflects the emotional nuances expressed during a call in the text, helping message recipients to more accurately understand the caller's intent. For example, if emotions such as joy or anxiety are expressed during a call, those emotions will be expressed in the text with specific emojis and formats, enhancing the richness of communication.
Another possible function would be to use voice recognition and analysis to automatically generate FAQs and a list of frequently asked questions from the content of calls, building a knowledge base that family members can refer to when they have similar inquiries. This knowledge base would be shared among family members and updated every time a new call occurs, improving the efficiency of communication between family members.
In addition, the text generated after the call ends can be sent not only via messenger apps or email, but also played back in audio format, making it easier for visually impaired family members or children with difficulty reading and writing to receive the information.
Finally, to protect the privacy of family members, the documented content after the call is completed can be automatically anonymized or masked to identify sensitive information in the document, providing an environment in which information can be shared with peace of mind. This allows for a balance between respecting each individual's privacy and sharing only the information that is necessary.
The embodiment of the present invention relates to automatic documentation and transmission of call contents, and new functions can be added to the document. For example, the document can be analyzed, and action items can be automatically generated after the call ends, and a reminder can be set on the user's smart device as a task that needs to be addressed. The reminder can be prioritized based on deadlines and importance mentioned in the call, helping the user to take action without forgetting.
Additionally, to enhance communication among family members, it would be beneficial to add the ability to automatically attach relevant images, videos, and links based on specific words or phrases in a written message, allowing for visual information to be shared in addition to text-based messages, making communication richer.
In addition, the system can enhance security by automatically detecting and blurring certain personal or confidential information when documenting call transcripts, in accordance with privacy protection guidelines and natural language processing techniques.
When converting call contents into text, it is also effective to incorporate a translation function into multiple languages to meet the diverse needs of users. In a multicultural environment where family members speak different languages, it is possible to automatically translate the call contents and send messages in a language that each member can easily understand.
To further enhance convenience, a schedule function will be added that allows users to customize the timing of message sending. Users can set messages to be sent at a specific date and time, rather than just immediately, allowing them to optimize the timing at which their family members receive information.
Finally, quick action buttons for replying or confirming messages have been provided so that message recipients can easily take action on received messages, enabling quick feedback and efficient communication, making information sharing between family members even smoother.
(Example 3)
The embodiment of the present invention is a system that checks phone numbers during a call, and has a means for linking with the police when linking with the police if the number has a history of misuse. Specifically, the system is equipped with a function that checks the incoming number against a database during a call, and automatically notifies the police if it detects that the number has a history of misuse.
The police cooperation means is realized by the specific processing unit 290 of the data processing device 12 as a function of checking the incoming number against a database during a call and automatically reporting to the police if it is detected that the number has a history of misuse. This cooperation means can also be realized by the control unit 46A of the smart glasses 214. The correspondence between each means and the device or control unit is not limited to the above example, and various changes are possible.
The extraction means is provided for extracting caller information contained in the signal of each call sent through the communication network, and can accurately obtain the caller's telephone number from the call data using digital signal processing technology. The extraction means also has a signal analysis function for analyzing Caller ID information and identifying the telephone number. The system can also accurately identify the called number by utilizing the signal extraction function for extracting the caller's telephone number from the call signal and an analysis algorithm specialized in decoding Caller ID information.
The matching means has a function of matching the extracted telephone number with a database that collects telephone numbers suspected of fraudulent use and organizes them by category, and the matching process can be performed in real time through a database management system, and rapid matching can be performed while the call is in progress using high-speed search algorithms and indexing technology.The matching means also has a high-performance database search function and indexing function for rapid search and matching by referring to a database that lists numbers with a history of misuse.
The linking means has a function of automatically reporting to the police when a number with a history of abuse is detected, includes an interface function with the reporting system, generates report information in a data format based on the protocol with the police reception system when reporting, and transmits it to the police reception system using a secure communication channel. The contents of the report information include information such as the detected abused phone number, the date and time of the call, the duration of the call, and the telecommunications carrier used by the caller, and uses encryption technology and authentication systems to protect personal information and ensure the accuracy of the report. In addition, the linking means is compatible with multiple communication protocols and reporting systems, has an adapter function for smooth data exchange between the systems, and has a function of adjusting the format of the report information to meet the requirements of the police reception system in the reporting process, and selecting an appropriate reporting protocol to make the report. When the reporting process is activated, the system transmits report information to the police reception system and obtains a report reception confirmation. This confirmation is used as information for the system to record that the report was made correctly and store it as a report history. The report history is used for future analysis and improvement, contributing to the efficiency and accuracy of the report process. This linking function uses a report data generation function to create a report format containing the necessary information, and transmits the information to the police reception system via a secure report transmission function. Data encryption and system authentication functions are implemented to ensure the security and reliability of reports. In addition, the system's linking means has an adapter function that is compatible with multiple report protocols and enables data exchange with different report systems, and this adapter function automatically adjusts to a format suitable for the protocol requirements at the time of reporting using a report protocol selection function, and transmits report information and confirms receipt. The report history recording function records the success of reports, which is used to analyze and improve system performance.
As an example of data collection without sensors, a function for users to report suspicious calls on an application or web interface is provided. The user enters their experience of the call and any suspicious points they noticed during the call into a form, and the information is registered in a database. The data collected through this manual report can be added to a list of phone numbers that are automatically matched in the database, contributing to improved accuracy in detecting numbers with a history of abuse. As an example of data collection without sensors, an input function is provided for users to report suspicious calls. Through an interactive reporting form, users register the contents of suspicious calls in a database, and the information collected is used to detect numbers with a history of abuse. This manual reporting system contributes to the expansion of the number matching database by providing additional data based on the user's experience and intuition.
The system can be equipped with a function to strengthen the database update mechanism. For example, newly confirmed abused numbers will be automatically added to the database even after they are reported. In addition, when a suspicious call is reported, the credit information of the number will be checked against other databases and combined with information based on user reports to more accurately identify abuse history. To ensure the integrity of the database, a regular cleaning process will be carried out to eliminate incorrect and outdated information. In addition, to strengthen cooperation with the reporting system, a function will be introduced to directly link with the crime database provided by the police, obtain crime information in real time during the matching process, and improve the accuracy of the matching results.
To improve the immediacy of reports, the matching process begins the moment the call is initiated, and if a number with a history of abuse is detected, the caller will be given the option to sound an alarm or automatically hang up. In addition, when the call is hung up, a recording of the call can be saved on the caller's behalf to assist police investigations. When police intervene, a detailed report including the caller's location and call history is automatically generated, helping to speed up criminal investigations.
The user interface will be updated to include a feature that allows users to check the progress of the reporting process in real time to increase transparency in the reporting system. Users will be able to check the results of their reports and feedback from the police, which will increase trust in the system. The system will also provide statistical data and trend analysis on abused numbers, providing information to users to be cautious about calls.
Furthermore, machine learning technology will be introduced to identify numbers with a history of abuse, and new numbers likely to be abused will be predicted by analyzing various indicators such as call patterns and call frequency. This will enable proactive updates to the database, strengthening measures to prevent unknown criminal activity. A function will also be added that allows users to provide direct feedback on the effectiveness of the reporting system, which will help improve the system. Feedback will be provided anonymously, protecting user privacy while collecting valuable information that will contribute to improving the system.
To make the phone number check during a call more effective, AI will learn the patterns of various fraudulent scams that users may face and implement a function to flag in real time when certain words or phrases are detected during a call. This will allow malicious behavior to be inferred and detected from the content of the call, rather than simply matching the phone number with abuse history in the database. In addition, the reporting process will be simplified by providing shortcuts and buttons on the smartphone interface that allow users to easily report calls that they suspect are fraudulent. This will allow the database to be updated more quickly and improve protection for other users.
To strengthen cooperation with the police, the reporting system will be integrated with location tracking capabilities to help police respond quickly based on reported information, helping them track and capture criminals. Artificial intelligence will also be built into the reporting system to analyze crime patterns from report data and provide police with information to plan preventive vigilance activities. Using such predictive analytics will help prevent future crimes.
To facilitate data sharing with the police, the system will receive real-time information on fraud cases and other crimes known to the police and update the database. This will ensure that the reporting system operates based on the latest crime information and strengthen measures to protect users. In addition, the system will be equipped with a block list function, allowing users to register suspicious numbers and block calls from them. This will allow users to directly control the risks themselves.
Educational programs will include online courses and workshops to provide users with information to recognize and prevent fraud methods. This will provide users with the knowledge to protect themselves and raise security awareness in society as a whole. Examples of fraud cases that were prevented by using the reporting system will also be shared to help users understand the effectiveness of the system.
Finally, through a system update, if a call is deemed likely to be fraudulent, a warning message will be automatically sent to the user to warn them against fraud. This will allow users to immediately recognize the possibility of fraud and take appropriate action.

なお、更に、ユーザの感情を推定する感情エンジンを組み合わせてもよい。すなわち、特定処理部２９０は、感情特定モデル５９を用いてユーザの感情を推定し、ユーザの感情を用いた特定処理を行うようにしてもよい。 Furthermore, an emotion engine that estimates the user's emotion may be combined. That is, the identification processing unit 290 may estimate the user's emotion using the emotion identification model 59, and perform identification processing using the user's emotion.

（形態例１）
本発明を実施するための形態は、未登録または非通知の電話番号からの着信時に、生成系AIが応答する応答手段と、通話終了後に用件をチャットGPTで文書化し、メッセンジャーアプリやメールで送信する送信手段とを具備するシステムである。さらに、ユーザの感情を認識する感情エンジンを組み合わせる感情認識手段として、通話中にユーザの声のトーンや言葉の選択などを分析し、感情を推定する。推定された感情に基づいて、生成系AIの応答や文書化された用件の表現を調整する。例えば、ユーザが不安な感情を示している場合には、より穏やかな表現を使用することで安心感を与える。
応答手段は、データ処理装置１２の特定処理部２９０によって実現され、未登録または非通知の電話番号からの着信に対して生成系AIが応答する。送信手段は、通話終了後に用件をチャットGPTで文書化し、メッセンジャーアプリやメールで送信する機能として、データ処理装置１２の特定処理部２９０によって実現される。感情認識手段は、通話中にユーザの声のトーンや言葉の選択を分析し、感情を推定する機能として、データ処理装置１２の特定処理部２９０またはスマート眼鏡２１４の制御部４６Ａによって実現される。推定された感情に基づいて、生成系AIの応答や文書化された用件の表現を調整する機能も、データ処理装置１２の特定処理部２９０またはスマート眼鏡２１４の制御部４６Ａによって実現される。各手段と装置や制御部との対応関係は、上述した例に限定されず、種々の変更が可能である。
応答手段は、自然言語処理と音声合成を組み合わせた対話生成手段が含まれる。この対話生成手段は、未登録または非通知の着信に対して自動で応答を行い、ユーザとの対話を可能にする。未登録または非通知の着信を検出すると、生成系AIは事前に訓練された会話モデルを用いて応答する。この会話モデルは、様々なシナリオに対応できるように多様な対話データを基に学習しており、着信者の質問や要求に対して適切な回答や案内を提供する。また、応答手段は、AIによる対話生成機能を備え、未登録または非通知の電話番号からの着信に対して、自動的に適切な返答を行うことができる。この機能は、通話の初期段階で着信者の目的や要望を理解し、対応する応答を行うための対話管理機能と、生成された応答を自然な音声に変換することもできるための音声生成機能を組み合わせたものである。対話管理機能は、特定のキーワードやフレーズの検出に基づいて着信者の意図を分析し、適切な返答を生成することができる。音声生成機能は、テキストベースの応答をリアルタイムに音声に変換し、着信者に対して自然な会話体験を提供することができる。
送信手段には、チャットボットや自然言語理解技術を用いた文書化手段が含まれる。通話終了後、チャットGPTのような高度な自然言語理解モデルを使用して、通話内容を精確に文書化する。文書化手段は、通話の要点を抽出し、要約する能力を有しており、用件を簡潔かつ明瞭に伝える文書を生成することができる。生成された文書は、メッセンジャーアプリやメールを介してユーザに送信される。この過程には、ユーザのメールアドレスやメッセンジャーアカウントへの連携機能が含まれ、文書は適切な形式で自動的に送付される。また、送信手段は、通話内容をテキスト化し、これをユーザがアクセス可能な形で提供することができる。通話が終了すると、通話内容文書化機能が活動し、会話の主要なポイントを抽出し、要約することができる。この要約されたテキストは、自動配信機能を通じてユーザ指定のメールアドレスやメッセンジャーアプリに送信される。この自動配信機能には、文書を適切なフォーマットで整え、指定された送信先に確実に届けるためのメール送信機能やアプリ連携機能が含まれる。
感情認識手段には、音声分析を行う声紋解析手段と、言葉の選択から感情を推定する言語解析手段が含まれる。声紋解析手段は、スマートデバイスのマイクロフォンを利用してユーザの声のトーン、ピッチ、速度などの特徴を検出し、それらの声の特性からユーザの感情状態を推定することができる。言語解析手段は、ユーザの発言の内容を解析し、使用される言葉やフレーズから感情的なコンテキストを抽出することができる。これらの分析結果は、生成系AIが応答を行う際や、文書化された用件の表現を調整する際に使用される。例えば、ユーザが不安を示している場合、応答や文書の表現はより穏やかで安心感を与えるように調整される。また、感情認識手段は、通話中のユーザの声の特徴と言語使用から感情を推定する機能を有する。声紋分析機能は、マイクロフォンによって収集された音声データから、声の高低、強弱、速度などの特徴を抽出し、これらの特性を解析することで感情を推定することができる。言語感情分析機能は、通話中の言語データを処理し、使用される単語やフレーズが持つ感情的な意味を解析し、ユーザの感情状態を把握することができる。これらの分析結果は、AIが行う応答のトーンや、文書化された通話内容の表現を調整する際に利用され、ユーザに対して適切な感情的対応を提供することができる。
センサーを含まないデータ収集手段としては、ユーザが自身で入力するテキストデータや、システム利用に関するフィードバックが挙げられる。これらは、ユーザ入力受付機能やフィードバック収集機能を通じてシステムに提供され、サービス改善のための貴重な情報源として活用される。
これらの手段は、ユーザの要求に迅速かつ効果的に対応し、コミュニケーションの質を向上させることを目的としている。また、各手段の実装は、データ処理装置やスマートデバイスの制御部によって柔軟に行われ、システムの効率性とユーザビリティを高めるために様々な形で変更が可能となっている。
本発明のシステムは、追加機能として、未登録または非通知の着信に対して、応答前に通話者の意図を推測するための概要予測手段を備えることができる。これにより、応答手段がより精度の高い対話を生成し、ユーザーにとって有意義なやり取りが実現する。また、生成系AIが応答する際には、通話者の国や地域に基づいた言語選択機能を持たせ、多言語対応の自動応答が可能となる。
送信手段に関しては、通話内容の文書化に加えて、重要なキーワードやフレーズのハイライト機能を設けることで、ユーザーが文書を素早く把握できるようにする。さらに、文書化された内容に基づいて自動的にアクションアイテムを生成し、ユーザーのタスクリストに追加する機能を追加することができる。
感情認識手段においては、通話中にユーザーの感情が変化した場合、その変化をリアルタイムで検知し、応答手段の対話のトーンやテンポを動的に調整する機能を持たせることができる。また、特定の感情が検出された場合には、それに応じた特別なサポートやアドバイスを提供する専門家への連絡を促すプロトコルも組み込むことができる。
応答手段には、着信者の過去の通話履歴や関連データを分析し、より個人化された応答を提供するパーソナライゼーション機能を追加することができる。これにより、ユーザーにとってより関連性が高い情報を提供し、応答の有用性を高めることが可能となる。
送信手段に関しては、文書化された通話内容に基づいてフォローアップのアクションを提案する機能を追加することができる。例えば、通話内容に含まれるタスクや予定に対して、カレンダーアプリへの自動登録機能を統合することで、ユーザーの時間管理をサポートする。
さらに、感情認識手段は、ユーザーのストレスレベルや緊張感を検知し、適宜、ストレス軽減のためのアドバイスやリラクゼーションコンテンツへのリンクを提供する機能を備えることができる。これにより、ユーザーの精神的な健康を支援し、総合的なウェルビーイングを促進することができる。
本形態例のシステムには、さらなる機能向上を図るための複数の追加機能が考慮される。例えば、未登録または非通知の着信に対して、通話者の声紋を分析し、以前の通話データと照合することにより、通話者の身元を特定する声紋認識手段を追加することができる。これにより、通話者が過去にシステムとやり取りしたことがある場合、その情報を基に応答手段がより適切な対応を行うことが可能となる。また、声紋認識手段は、セキュリティ対策としても機能し、ユーザーに対する信頼性の高い通話体験を提供する。
送信手段についても、通話内容を文書化する際に、通話の内容を構造化し、情報の重要度に応じてテキストの階層化を行う機能を考慮する。これにより、ユーザーは文書を読む際に重要な情報をより迅速に把握できるようになる。さらに、文書化された内容をユーザーの好みや過去の行動パターンに合わせてカスタマイズすることで、より個人的な体験を提供することが可能となる。
感情認識手段では、通話中にユーザーのストレスレベルを検知し、ストレスが高いと推定される場合には、通話内容に関連したリラクゼーション方法や心理的サポートへの案内を提供する。これにより、ユーザーが通話を通じてリラックスし、ストレスを軽減できるようなサービスを提供する。
さらに、応答手段には、通話内容に基づいてユーザーへのフォローアップアクションを自動的に提案する機能を追加する。例えば、通話中に提案された製品やサービスに関する追加情報へのリンクを提供したり、次の行動ステップを提案することで、ユーザーの意思決定をサポートする。
応答手段の改善としては、通話者の意図に応じて自動的に応答スタイルを変更する機能を検討する。たとえば、通話者が緊急の状況を示している場合には、迅速かつ的確な指示を提供するようにAIを調整する。このような対応により、通話者のニーズに即応できるようなシステムを実現する。
これらの追加機能は、ユーザーエクスペリエンスの向上を目指すとともに、通話内容の正確な把握と迅速な対応を可能にするためのものである。また、それぞれの機能は、データ処理装置やスマートデバイスの制御部の能力を最大限に活用し、システムの有用性をさらに高めることが期待される。
（形態例２）
本発明を実施するための形態は、通話終了後に用件をチャットGPTで文書化し、メッセンジャーアプリやメールで送信する際に、ユーザの感情を認識する感情エンジンを使用して文書の表現を調整する調整手段とを具備するシステムである。具体的には、文書化された用件を感情エンジンに入力し、ユーザの感情を分析する。分析結果に基づいて、文書の表現を適切に調整する。例えば、ユーザが喜びや興奮を感じている場合には、より明るく活気のある表現を使用することで、ユーザの感情を共有する。
調整手段は、通話終了後に用件をチャットGPTで文書化し、メッセンジャーアプリやメールで送信する際に、ユーザの感情を認識する感情エンジンを使用して文書の表現を調整する機能として、データ処理装置１２の特定処理部２９０によって実現される。文書化された用件を感情エンジンに入力し、ユーザの感情を分析する機能も、データ処理装置１２の特定処理部２９０によって実現される。分析結果に基づいて文書の表現を適切に調整する機能は、データ処理装置１２の特定処理部２９０またはスマート眼鏡２１４の制御部４６Ａによって実現される。各手段と装置や制御部との対応関係は、上述した例に限定されず、種々の変更が可能である。
会話内容抽出手段は、音声認識技術を駆使して通話内容をテキストデータに変換する。また、会話内容抽出手段は、音声信号を受信した後、ノイズリダクション手段を用いて背景雑音を除去し、音声からテキストへの変換精度を向上させる。さらに、会話内容抽出手段は、最先端の音声認識技術を用いて、通話内容を精確にテキストデータへ変換し、その過程でノイズ除去手段が背景雑音を除去し、変換精度を向上させる。
テキスト処理手段は、変換されたテキストデータの構文上の誤りを修正し、言語の流暢さを保つために文法検査手段を介して文法チェックを行う。また、テキスト処理手段は、テキスト化されたデータの文法検査手段が語法の正確性を保証し、文書の自然な流れを維持するための調整を行う。
感情認識手段には、テキストマイニングと感情分析技術に基づく感情抽出手段が含まれ、生成されたテキストデータの言葉遣いや文脈からユーザの感情を推定する。また、感情抽出手段は、様々な感情を表す単語やフレーズ、文法的パターンを識別し、それらをポジティブ、ネガティブ、ニュートラルなどの感情カテゴリに分類する。また、感情強調手段は、抽出された感情に応じてテキストのトーンや言い回しを調整し、ユーザの感情状態をより適切に伝えるための修正を行う。さらに、感情抽出手段は、文書に表れる言語パターンからユーザの感情を読み取り、それを基に文書のトーンを調整する。
調整手段は、感情認識手段によって分析された感情データを基に、テキストの表現を変化させる。また、表現強化手段は、喜びや興奮などのポジティブな感情が検出された場合、使われる語彙をより明るく活気のあるものに置き換え、メッセージに好意的な印象を与える。さらに、表現緩和手段は、悲しみや怒りなどのネガティブな感情が検出された場合、メッセージのトーンを穏やかにし、共感と理解を示すような言い回しを選択する。また、ユーザの感情がポジティブな場合は、表現強化手段が文書に活力を与え、ネガティブな感情を示す場合は、表現緩和手段によって、より穏やかな表現を使用する。
メッセージの送信手段には、メッセンジャーアプリやメールクライアントとの連携機能が含まれ、調整されたテキストを適切な形式で送信する。また、送信プロトコル選定手段は、受信者のプラットフォームや設定に合わせて最適な送信プロトコルを選択し、メッセージの配信を保証する。また、ユーザインタフェース提示手段は、送信前に文書の最終レビューを行うためのプレビュー画面を提供し、ユーザが必要に応じて最終的な修正を加えることができるようにする。さらに、メッセージ送信手段がメッセンジャーアプリやメールクライアントに適したフォーマットで調整されたテキストを送信し、プレビュー画面がユーザが文書を最終確認するためのインターフェースを提供する。
以上のプロセスは、ユーザの使用するデバイスや設定に応じて、スマートデバイスの制御部やデータ処理装置に内蔵された特定処理部で実現される。また、これらの手段は、モジュール化されたコンポーネントとして設計され、システムの構成要素としての交換や拡張が容易に行われる。さらに、各手段の対応関係はフレキシブルに設定されており、システムのアップグレードやカスタマイズに対応するための多様な変更が可能である。また、この一連のプロセスは、デバイスや環境に応じて柔軟に対応できるようにモジュール化されており、システムのアップグレードやカスタマイズが容易に行えるように設計されている。
この形態例を更に拡張して、ユーザーの感情をより深く理解し、コミュニケーションの質を高める機能を追加することができる。例えば、感情エンジンにビデオチャット中の表情認識機能を組み込むことで、視覚的な情報からも感情を分析し、より正確な感情判断を行う。感情認識の精度を向上させるために、ユーザーの声のトーンやピッチの分析も行い、テキストに反映させることが可能である。さらに、ユーザーの過去のコミュニケーション履歴や反応パターンを分析することで、個人の感情表現スタイルを学習し、それに応じたよりパーソナライズされたテキスト調整を実現する。
テキスト処理手段は、表現の多様性と創造性を高めるために、文学作品や詩などからインスピレーションを得た言い回しを提案する機能を持つ。これにより、ユーザーの感情がより豊かに表現される。また、社会的コンテキストや文化的背景を考慮し、コミュニケーションが行われる環境や状況に合わせた適切な表現を選択することもできる。
メッセージ送信手段には、送信されるテキストが受け手の感情に与える影響を予測する機能を追加し、ユーザーがより責任を持ってコミュニケーションを取れるようにする。さらに、受信者の反応をAIが予測し、その情報をもとにユーザーが次に取るべきコミュニケーション戦略を提案することも可能である。
全体として、これらの機能は、ユーザーが感情を的確かつ敏感にコミュニケーションに反映させることをサポートし、より深い人間関係の構築に貢献する。また、これらの進化した手段は、個人だけでなく、企業のカスタマーサポートやCRMシステムにおいても、顧客との関係を深めるために有効活用できる。さらに、これらのシステムは、ユーザー教育やカウンセリングといった人間の感情が重要な役割を果たす分野での応用が期待される。
本システムは、感情エンジンを活用してユーザーの感情に応じた文書の表現調整を提供する。この機能を拡張するために、ユーザーの生体情報を取得するセンサーを統合し、心拍数や皮膚の導電率などの生理的反応から感情をより正確に読み取ることができる。センサーからのデータはリアルタイムで分析され、文書のトーンを即座に調整することが可能となる。
さらに、ユーザーの日常的なコミュニケーションを継続的に分析し、その人固有の表現スタイルや好みを把握する個性化学習機能を搭載する。この機能により、システムはユーザーの個性を反映したより自然で個別化された文書の提案が可能になる。
また、マルチリンガル対応を強化し、さまざまな言語での感情的ニュアンスを捉えることができるようにする。この機能により、国際的なコミュニケーションや多言語を話すユーザー間での理解を深めることができる。
ユーザーのプライバシー保護のために、感情データの匿名化や暗号化を行い、セキュリティを強化する機能も追加する。これにより、ユーザーは安心してシステムを利用できるようになる。
教育やカウンセリングの分野での応用を目指し、感情認識の結果を活用してコミュニケーションスキルのトレーニングをサポートする機能を開発する。トレーニングプログラムには、感情表現の練習や、適切なコミュニケーション手法の学習が含まれる。
最後に、システムのユーザビリティを向上させるために、ユーザーインターフェースをリッチかつ直感的なものにする。さまざまなジェスチャーや音声コマンドをサポートし、ユーザーが文書の調整プロセスに容易に介入できるようにする。これにより、ユーザーは自分の意志で表現を微調整し、より個人的なコミュニケーションを実現できる。
（形態例３）
本発明を実施するための形態は、通話中にユーザの感情を認識する感情エンジンを使用して、応答や対話の内容をユーザの感情に合わせて調整する調整手段とを具備するシステムである。具体的には、通話中にユーザの声のトーンや言葉の選択などを感情エンジンに入力し、ユーザの感情を推定する。推定された感情に基づいて、生成系AIの応答や対話の内容を調整する。例えば、ユーザが悲しい感情を示している場合には、共感の言葉を用いて励ましや支援を提供する。
調整手段は、通話中にユーザの感情を認識する感情エンジンを使用して、応答や対話の内容をユーザの感情に合わせて調整する機能として、データ処理装置１２の特定処理部２９０またはスマート眼鏡２１４の制御部４６Ａによって実現される。通話中にユーザの声のトーンや言葉の選択を感情エンジンに入力し、ユーザの感情を推定する機能も、データ処理装置１２の特定処理部２９０またはスマート眼鏡２１４の制御部４６Ａによって実現される。推定された感情に基づいて生成系AIの応答や対話の内容を調整する機能は、データ処理装置１２の特定処理部２９０またはスマート眼鏡２１４の制御部４６Ａによって実現される。各手段と装置や制御部との対応関係は、上述した例に限定されず、種々の変更が可能である。
音声データ収集手段は、高感度でノイズキャンセリング機能を備えたマイクロフォンが含まれる。このマイクロフォンは、周囲の雑音を除去し、ユーザの声を明瞭に録音するためのデジタル信号処理アルゴリズムを搭載している。また、音声データ収集手段は、ユーザの発話から微細な音響特性を把握する高感度マイクロフォンを採用し、これにデジタル信号処理技術を組み合わせて周囲の雑音を効果的に除去する。このマイクロフォンは、ユーザの声の特性を正確に捉え、感情の変化を検出するための基礎データを提供する。
音声特徴抽出手段は、音声信号から人間の感情を反映する可能性のある特徴量を抽出する。この抽出手段は、音響特徴分析を行うためのスペクトログラム解析機能やピッチ追跡機能、音響モデルを用いた感情識別機能が含まれる。また、音声特徴抽出手段は、スペクトル分析機能やピッチ解析機能といった音響解析ツールを用いて、音声信号から感情を示唆する特徴量を抽出し、これらのデータを感情推定モデルへと供給する。
感情分析手段には、機械学習に基づいた感情推定モデルが含まれ、音声特徴抽出手段によって抽出された特徴量を入力として、ユーザの感情状態を推定する。感情推定モデルは、トレーニングデータに基づいて訓練されたニューラルネットワーク、サポートベクターマシン、決定木などの分類器から構成され、ユーザの感情をポジティブ、ネガティブ、中立などのカテゴリに分類する。感情推定モデルは、継続的な学習を通じてその精度を向上させ、ユーザの感情に対する認識の微妙な変化にも対応できるように進化する。また、感情分析手段は、機械学習技術を活用した感情推定モデルを有し、抽出された音声特徴を元にユーザの感情を分析する。このモデルは、様々な機械学習アルゴリズムを組み合わせて、ユーザの発話から感情カテゴリーを識別し、これに基づいてユーザの感情状態を推定する。推定された感情状態は、ユーザの発話や振る舞いに対するシステムの反応を調整するための重要な情報となる。
対話調整手段には、生成系AIモデルを用いた応答生成機能が含まれ、感情分析手段によって推定された感情に応じて、対話の内容を動的に調整する。応答生成機能は、自然言語生成技術を駆使し、ユーザの感情に適した言葉選びやトーンを用いて応答文を生成する。例えば、ユーザが悲しい感情を示している場合、共感表現や慰めの言葉を含む応答が生成される。この応答生成機能は、会話の文脈を考慮し、ユーザの感情とコミュニケーションの目的に合致した内容を提供するためのコンテキストアウェア処理機能を備える。また、生成された応答は、ユーザにとって自然であり、感情的なニーズを満たすように構築される。応答生成機能は、大規模な会話データセットに基づいて訓練された機械学習モデルによって実現され、ユーザの言葉遣いや話し方に適応することができる。また、対話調整手段には、生成系AIモデルを用いた応答生成機能が含まれ、推定された感情に適切に応じた対話内容を生成する。この機能は、ユーザの感情に対応する言葉選びや対話トーンを選定し、ユーザの現在の感情や会話の文脈に合わせた反応を提供する。応答生成機能は、コンテキストアウェアな処理を行い、ユーザが必要とする情報やサポートを適切な形で提供するために設計されている。この機能は、大量の会話データから学習されたモデルを基に、ユーザの言葉遣いや情緒に適応した応答を生成することが可能である。
センサーを含まないデータ収集手段としては、ユーザがシステムに直接入力したテキスト情報や、通話記録から得られるメタデータが考えられる。これらは、ユーザの行動パターンや好みを分析する際の補足情報として利用され、感情エンジンの精度向上や応答生成機能の最適化に寄与する。ユーザが入力するテキスト情報は、感情分析手段の一部として、感情推定モデルの訓練データとしても活用される。
本システムは、通話中にユーザの感情をリアルタイムで分析し、対話内容を自動調整する能力を有しているため、さらに細かな感情の変化を捉えるために、音声データに加えて、表情認識技術を統合することができる。ウェブカメラやスマートデバイスのカメラを活用して、ユーザの顔の表情を分析し、感情分析の精度を向上させる。この追加された表情認識機能は、ユーザの感情をより正確に認識し、さらに微細な感情変化に応じた対話調整が可能となる。
また、ユーザの生理的シグナルを捉えるためのウェアラブルデバイスを組み込むことも考えられる。心拍数や皮膚電気活動など、生理的反応を測定することで、声のトーンや表情だけでは捉えきれない感情の深層を解析する。これらのデータを統合することで、感情分析の精度はさらに向上し、より適切な対話応答を生成することができる。
対話調整手段には、ユーザの文化的背景や個人的な価値観を考慮したカスタマイズ機能を追加することも有効である。ユーザプロファイルを構築し、その情報に基づいて、対話のトーンや内容をさらにパーソナライズする。これにより、ユーザ一人ひとりに合わせたきめ細やかなサポートを提供することが可能となる。
さらに、感情推定モデルの進化を促すために、クラウドソーシングによる感情データの収集や、多様なユーザからのフィードバックを取り入れて、モデルを継続的にアップデートする仕組みを構築する。これにより、多様な感情表現や言語に対応できる柔軟なシステムとなる。
また、教育やメンタルヘルスケアの分野における応用も検討することができる。例えば、教育分野では、学生の感情に適応した教材の提示やカウンセリングセッションでの使用が考えられる。メンタルヘルスケアでは、ユーザの感情を認識し、ストレスや不安を軽減するための対話支援を行う。これにより、ユーザが抱える問題に対してより効果的なアプローチが可能となる。
システムのプライバシー保護に関しても、ユーザの感情データを安全に保管し、適切なアクセス制御と暗号化技術を用いて、情報漏洩のリスクを最小限に抑えるためのセキュリティ対策を強化する。これにより、ユーザは安心してシステムを利用することができる。
最終的には、このシステムが提供するパーソナライズされた対話体験が、ユーザの生活の質を向上させるようなサービスへと発展することが期待される。
本発明の形態は、通話のみならず、ビデオ会議やオンライン教育のプラットフォームにも適用可能である。例えば、講師が生徒の感情をリアルタイムで把握し、カリキュラムの進行を感情に合わせて調整することで、より効果的な学習経験を提供する。ビデオ会議においても、参加者の感情を反映した対話管理が行われ、生産的かつポジティブな会議環境を促進する。
また、本システムには、感情反応に基づいた健康状態の監視機能を追加することも可能である。例えば、ユーザの声のトーンや話し方が一定期間にわたってネガティブな感情を示している場合、メンタルヘルスの専門家に通知を送り、必要に応じた介入を促すことができる。
さらに、感情エンジンの高度化に向け、ユーザの日常生活における感情パターンを分析し、その情報を元に長期的な感情管理やストレス軽減のアドバイスを提供する機能を組み込む。ユーザの生活リズムや活動パターンを分析し、感情の波を予測することで、適切なタイミングでリラクゼーションやモチベーション向上のためのコンテンツを提案する。
このシステムは、カスタマーサポートの分野でも応用が期待される。例えば、コールセンターのオペレーターが顧客の感情をリアルタイムで把握し、不満や怒りなどのネガティブな感情を検出した際には、即座に対応策を講じ、顧客満足度の向上に寄与する。
また、ゲームやエンターテインメントの分野でも、ユーザの感情に応じてコンテンツを動的に変化させることで、没入感や楽しさを増幅させる効果が期待される。ゲーム内のキャラクターがプレイヤーの感情に反応し、ストーリー展開や対話内容が変化することで、よりパーソナライズされた体験を実現する。
さらに、音声アシスタントや仮想現実（VR）との統合を図り、ユーザの感情に対してより自然な対話を実現する。音声アシスタントはユーザの感情を把握し、個々のニーズに合わせた情報やサービスを提供する。VR環境では、ユーザの感情に応じてシナリオや環境が変化し、リアルタイムで感情に合わせた体験を提供する。
本システムは、ユーザインタフェース（UI）やユーザーエクスペリエンス（UX）の設計においても革新をもたらす可能性を秘めている。感情認識技術を利用して、ユーザの感情に最適化されたUIやUXを提供し、利用者の満足度を高める。例えば、ウェブサイトやアプリケーションがユーザの感情をリアルタイムで把握し、コンテンツの提示方法やインタラクションの形式を調整する。
最終的には、このシステムが提供する感情調整機能が、人間関係の質を向上させ、コミュニケーションの効果を高めるツールとして社会に広く浸透していくことが期待される。 (Example 1)
The embodiment of the present invention is a system that includes a response means in which a generative AI responds when a call is received from an unregistered or unnotified phone number, and a transmission means in which the matter is documented in chat GPT after the call is ended and sent via a messenger app or email. Furthermore, as an emotion recognition means that combines an emotion engine that recognizes the user's emotions, the tone of the user's voice and choice of words are analyzed during the call to estimate the emotion. Based on the estimated emotion, the generative AI's response and the expression of the documented matter are adjusted. For example, if the user shows an anxious emotion, a calmer expression is used to provide a sense of security.
The response means is realized by the specific processing unit 290 of the data processing device 12, and the generative AI responds to an incoming call from an unregistered or unnotified phone number. The transmission means is realized by the specific processing unit 290 of the data processing device 12 as a function of documenting the matter in chat GPT after the call is ended and sending it by messenger app or email. The emotion recognition means is realized by the specific processing unit 290 of the data processing device 12 or the control unit 46A of the smart glasses 214 as a function of analyzing the tone of the user's voice and choice of words during the call and estimating emotions. The function of adjusting the response of the generative AI and the expression of the documented matter based on the estimated emotion is also realized by the specific processing unit 290 of the data processing device 12 or the control unit 46A of the smart glasses 214. The correspondence between each means and the device or control unit is not limited to the above example, and various changes are possible.
The response means includes a dialogue generation means that combines natural language processing and speech synthesis. The dialogue generation means automatically responds to unregistered or unnamed incoming calls, enabling a dialogue with the user. When an unregistered or unnamed incoming call is detected, the generative AI responds using a pre-trained conversation model. This conversation model is trained based on a variety of dialogue data so that it can handle various scenarios, and provides appropriate answers and guidance to the callee's questions and requests. In addition, the response means has an AI dialogue generation function, and can automatically provide an appropriate response to an incoming call from an unregistered or unnamed phone number. This function combines a dialogue management function for understanding the callee's purpose and request at the early stage of the call and providing a corresponding response, and a voice generation function for converting the generated response into a natural voice. The dialogue management function can analyze the callee's intention based on the detection of specific keywords and phrases and generate an appropriate response. The voice generation function can convert a text-based response into voice in real time, providing the callee with a natural conversation experience.
The transmission means includes a chatbot and a documentation means using natural language understanding technology. After the call is ended, the contents of the call are accurately documented using an advanced natural language understanding model such as chat GPT. The documentation means has the ability to extract and summarize the main points of the call, and can generate a document that conveys the purpose concisely and clearly. The generated document is sent to the user via a messenger app or email. This process includes a link to the user's email address or messenger account, and the document is automatically sent in the appropriate format. The transmission means can also convert the contents of the call into text and provide it in an accessible form to the user. After the call is ended, the call content documentation function is activated and can extract and summarize the main points of the conversation. This summarized text is sent to the user's designated email address or messenger app through an automatic delivery function. This automatic delivery function includes an email delivery function and an app link function to properly format the document and ensure that it is delivered to the designated destination.
The emotion recognition means includes a voiceprint analysis means for performing voice analysis, and a language analysis means for estimating emotions from the choice of words. The voiceprint analysis means can detect characteristics of the user's voice, such as tone, pitch, and speed, using the microphone of the smart device, and can estimate the user's emotional state from these voice characteristics. The language analysis means can analyze the content of the user's speech and extract emotional context from the words and phrases used. These analysis results are used by the generative AI when making a response or adjusting the expression of documented matters. For example, if the user shows anxiety, the expression of the response or document is adjusted to be more gentle and reassuring. In addition, the emotion recognition means has a function of estimating emotions from the characteristics of the user's voice and language use during a call. The voiceprint analysis function can extract characteristics such as the pitch, strength, and speed of the voice from the voice data collected by the microphone, and can estimate emotions by analyzing these characteristics. The language emotion analysis function can process the language data during a call, analyze the emotional meaning of the words and phrases used, and grasp the user's emotional state. These analytics can be used to adjust the tone of the AI's responses and the wording of documented calls to provide an appropriate emotional response to the user.
Data collection methods that do not involve sensors include text data entered by users themselves and feedback on system usage. These are provided to the system through the user input acceptance function and feedback collection function, and are used as a valuable source of information for service improvement.
These means are intended to respond quickly and effectively to user requests and improve the quality of communication. In addition, the implementation of each means is flexibly performed by the data processing device or the control unit of the smart device, and can be modified in various ways to improve the efficiency and usability of the system.
The system of the present invention can be equipped with an additional function of a summary prediction means for predicting the caller's intention before answering an unregistered or unnamed call. This allows the response means to generate a more accurate dialogue, realizing a meaningful exchange for the user. In addition, when the generative AI responds, it can have a language selection function based on the caller's country or region, enabling automatic response in multiple languages.
As for the delivery method, in addition to documenting the contents of the call, it can also highlight important keywords and phrases to help users quickly understand the document, and automatically generate action items based on the documented content and add them to the user's task list.
The emotion recognition means can detect changes in the user's emotions during a call in real time and dynamically adjust the tone and tempo of the conversation of the response means. Protocols can also be built in that, when a particular emotion is detected, prompt the user to contact an expert who can provide special support or advice according to the emotion.
The response tool can be enhanced with a personalization feature that analyzes the caller's past call history and related data to provide a more personalized response, making it possible to provide more relevant information to the user and increase the usefulness of the response.
Regarding sending methods, a function can be added that suggests follow-up actions based on the documented content of the call. For example, tasks and events included in the call content can be automatically registered in a calendar app to help users manage their time.
Furthermore, the emotion recognition means may be capable of detecting the user's stress level or tension and providing appropriate advice on how to reduce stress or links to relaxation content, thereby supporting the user's mental health and promoting overall well-being.
The system of this embodiment is considered to have a number of additional functions for further improving its functionality. For example, a voiceprint recognition means can be added to identify a caller by analyzing the caller's voiceprint and comparing it with previous call data for an unregistered or unnamed call. This allows the response means to respond more appropriately based on information if the caller has previously interacted with the system. The voiceprint recognition means also functions as a security measure, providing a reliable call experience for the user.
Regarding the means of transmission, when documenting the contents of the call, the function of structuring the contents of the call and classifying the text according to the importance of the information will be considered. This will allow users to grasp important information more quickly when reading the document. In addition, the documented content can be customized according to the user's preferences and past behavior patterns to provide a more personal experience.
The emotion recognition means detects the user's stress level during a call, and if it is estimated that the user is under high stress, it provides guidance on relaxation methods and psychological support related to the content of the call, thereby providing a service that allows users to relax and reduce stress through the call.
Additionally, the response tool will automatically suggest follow-up actions to users based on the content of the call, for example providing links to additional information about products or services suggested during the call or suggesting next steps of action to help users make decisions.
To improve response methods, we are considering a function that automatically changes the response style according to the caller's intentions. For example, if the caller indicates an emergency situation, we will adjust the AI to provide quick and accurate instructions. This will create a system that can immediately respond to the caller's needs.
These additional functions are intended to improve the user experience and enable accurate understanding of call content and rapid response. Each function is expected to maximize the capabilities of data processing devices and smart device control units, further enhancing the usability of the system.
(Example 2)
The embodiment of the present invention is a system that includes an adjustment means for documenting the matter in chat GPT after the call ends, and adjusting the expression of the document using an emotion engine that recognizes the user's emotions when sending it by messenger app or email. Specifically, the documented matter is input to the emotion engine, and the user's emotions are analyzed. Based on the analysis results, the expression of the document is appropriately adjusted. For example, if the user is feeling happy or excited, the user's emotions are shared by using brighter and more lively expressions.
The adjustment means is realized by the specific processing unit 290 of the data processing device 12 as a function of documenting the matter in the chat GPT after the call is ended and adjusting the expression of the document using an emotion engine that recognizes the user's emotions when sending it by a messenger app or email. The function of inputting the documented matter to the emotion engine and analyzing the user's emotions is also realized by the specific processing unit 290 of the data processing device 12. The function of appropriately adjusting the expression of the document based on the analysis result is realized by the specific processing unit 290 of the data processing device 12 or the control unit 46A of the smart glasses 214. The correspondence relationship between each means and the device or control unit is not limited to the above-mentioned example, and various changes are possible.
The conversation content extraction means converts the contents of the call into text data by making full use of voice recognition technology. After receiving the voice signal, the conversation content extraction means uses noise reduction means to remove background noise and improve the accuracy of the voice-to-text conversion. Furthermore, the conversation content extraction means uses cutting-edge voice recognition technology to accurately convert the contents of the call into text data, and in the process, the noise reduction means removes background noise and improves the conversion accuracy.
The text processing means corrects syntactic errors in the converted text data and performs grammar checking via the grammar checking means to maintain linguistic fluency. The text processing means also performs adjustments to ensure that the grammar checking means of the converted text data ensures accuracy of grammar and maintains the natural flow of the document.
The emotion recognition means includes an emotion extraction means based on text mining and emotion analysis techniques, which estimates the user's emotion from the wording and context of the generated text data. The emotion extraction means also identifies words, phrases, and grammatical patterns that express various emotions and classifies them into emotion categories such as positive, negative, and neutral. The emotion emphasis means also adjusts the tone and phrasing of the text according to the extracted emotions, making modifications to better convey the user's emotional state. The emotion extraction means also reads the user's emotion from the language patterns expressed in the document and adjusts the tone of the document based on the emotion.
The adjustment means changes the expression of the text based on the emotion data analyzed by the emotion recognition means. When a positive emotion such as joy or excitement is detected, the expression enhancement means replaces the vocabulary used with brighter and more lively words to give the message a favorable impression. When a negative emotion such as sadness or anger is detected, the expression mitigation means softens the tone of the message and selects phrases that show empathy and understanding. When the user's emotion is positive, the expression enhancement means energizes the document, and when the user's emotion shows negative emotion, the expression mitigation means uses a more gentle expression.
The message sending means includes a function of linking with a messenger application or an email client and sends the adjusted text in an appropriate format. Furthermore, the transmission protocol selection means selects an optimal transmission protocol according to the platform and settings of the recipient and ensures delivery of the message. Furthermore, the user interface presentation means provides a preview screen for a final review of the document before sending, allowing the user to make final corrections as necessary. Furthermore, the message sending means sends the adjusted text in a format suitable for the messenger application or the email client, and the preview screen provides an interface for the user to make a final check of the document.
The above processes are realized by a specific processing unit built into the control unit of the smart device or the data processing device, depending on the device and settings used by the user. Moreover, these means are designed as modularized components, and can be easily replaced or expanded as system components. Furthermore, the correspondence between each means is set flexibly, and various changes can be made to accommodate system upgrades and customization. Moreover, this series of processes is modularized so that it can flexibly respond to devices and environments, and is designed to facilitate system upgrades and customization.
This embodiment can be further expanded to add a function for deeper understanding of the user's emotions and improving the quality of communication. For example, by incorporating a facial expression recognition function during video chat into the emotion engine, emotions can be analyzed from visual information as well, resulting in more accurate emotion judgment. To improve the accuracy of emotion recognition, the tone and pitch of the user's voice can also be analyzed and reflected in the text. Furthermore, by analyzing the user's past communication history and response patterns, the system can learn the individual's emotional expression style and adjust the text accordingly in a more personalized manner.
The text processing means has a function to suggest phrases inspired by literary works and poetry to enhance the diversity and creativity of expressions, allowing users to express their emotions more richly. It can also select appropriate expressions according to the environment and situation in which the communication takes place, taking into account the social context and cultural background.
The messaging tool will add a function to predict the emotional impact of the text sent on the recipient, allowing users to communicate more responsibly. In addition, AI can predict the recipient's reaction and use that information to suggest the next communication strategy the user should take.
Overall, these features will help users to reflect emotions accurately and sensitively in their communications, contributing to building deeper human relationships. These advanced methods can be effectively used by individuals as well as corporate customer support and CRM systems to deepen customer relationships. Furthermore, these systems are expected to be applied in fields where human emotions play an important role, such as user education and counseling.
The system leverages an emotion engine to provide document expression adjustments based on the user's emotions. To extend this functionality, sensors are integrated to capture the user's biometric information, allowing for a more accurate reading of emotions from physiological responses such as heart rate and skin conductivity. Data from the sensors is analyzed in real time, allowing the tone of the document to be adjusted on the fly.
In addition, it is equipped with a personalized learning function that continuously analyzes the user's daily communication to understand the user's unique expression style and preferences, allowing the system to suggest more natural and personalized documents that reflect the user's personality.
It will also enhance multilingual support, allowing it to capture emotional nuances in different languages, enhancing international communication and understanding among multilingual users.
To protect user privacy, the company will also add features to anonymize and encrypt emotion data and strengthen security, allowing users to use the system with peace of mind.
Aiming for applications in the fields of education and counseling, we will develop a function that uses the results of emotion recognition to support communication skills training. The training program will include practicing emotional expression and learning appropriate communication techniques.
Finally, to improve the usability of the system, we make the user interface rich and intuitive, supporting a variety of gestures and voice commands, and allowing users to easily intervene in the document adjustment process, allowing users to fine-tune the expression at their own will and achieve more personal communication.
(Example 3)
The embodiment of the present invention is a system that includes an emotion engine that recognizes the user's emotions during a call and an adjustment means that adjusts the response and dialogue content to match the user's emotions. Specifically, the user's tone of voice and choice of words are input to the emotion engine during a call to estimate the user's emotions. The generative AI's responses and dialogue content are adjusted based on the estimated emotions. For example, if the user shows sad emotions, words of empathy are used to provide encouragement and support.
The adjustment means is realized by the specific processing unit 290 of the data processing device 12 or the control unit 46A of the smart glasses 214 as a function of adjusting the response and the content of the dialogue to match the user's emotions using an emotion engine that recognizes the user's emotions during a call. A function of inputting the user's tone of voice and choice of words into the emotion engine during a call and estimating the user's emotions is also realized by the specific processing unit 290 of the data processing device 12 or the control unit 46A of the smart glasses 214. A function of adjusting the response of the generative AI and the content of the dialogue based on the estimated emotions is realized by the specific processing unit 290 of the data processing device 12 or the control unit 46A of the smart glasses 214. The correspondence between each means and the device or control unit is not limited to the above example, and various changes are possible.
The voice data collection means includes a microphone with high sensitivity and noise canceling function. The microphone is equipped with a digital signal processing algorithm to remove ambient noise and clearly record the user's voice. The voice data collection means also employs a highly sensitive microphone that captures subtle acoustic characteristics from the user's speech and combines this with digital signal processing technology to effectively remove ambient noise. The microphone accurately captures the characteristics of the user's voice and provides basic data for detecting changes in emotions.
The voice feature extraction means extracts features that may reflect human emotions from the voice signal. This extraction means includes a spectrogram analysis function and a pitch tracking function for performing acoustic feature analysis, and an emotion identification function using an acoustic model. The voice feature extraction means also uses acoustic analysis tools such as a spectrum analysis function and a pitch analysis function to extract features that suggest emotions from the voice signal, and supplies these data to the emotion estimation model.
The emotion analysis means includes an emotion estimation model based on machine learning, which estimates the user's emotional state using the features extracted by the voice feature extraction means as input. The emotion estimation model is composed of classifiers such as neural networks, support vector machines, and decision trees trained based on training data, and classifies the user's emotions into categories such as positive, negative, and neutral. The emotion estimation model improves its accuracy through continuous learning, and evolves to be able to respond to subtle changes in the user's perception of emotions. In addition, the emotion analysis means has an emotion estimation model that utilizes machine learning technology, and analyzes the user's emotions based on the extracted voice features. This model combines various machine learning algorithms to identify emotion categories from the user's utterance, and estimates the user's emotional state based on this. The estimated emotional state is important information for adjusting the system's response to the user's utterance and behavior.
The dialogue adjustment means includes a response generation function using a generative AI model, which dynamically adjusts the content of the dialogue according to the emotion estimated by the emotion analysis means. The response generation function utilizes natural language generation technology to generate a response sentence using words and tones appropriate for the user's emotion. For example, if the user is expressing sad emotion, a response including empathetic expressions and comforting words is generated. This response generation function has a context-aware processing function for taking into account the context of the conversation and providing content that matches the user's emotion and the purpose of the communication. In addition, the generated response is constructed to be natural to the user and to meet the emotional needs. The response generation function is realized by a machine learning model trained on a large-scale conversation dataset and can adapt to the user's language and speaking style. In addition, the dialogue adjustment means includes a response generation function using a generative AI model, which generates dialogue content appropriately corresponding to the estimated emotion. This function selects words and a dialogue tone that correspond to the user's emotion and provides a response that matches the user's current emotion and the context of the conversation. The response generation function is designed to perform context-aware processing and provide the information and support required by the user in an appropriate form. This feature is capable of generating responses that adapt to a user's language and emotions based on models trained from large amounts of conversational data.
Non-sensor-based data collection methods include text information entered directly by users into the system and metadata obtained from call records. These are used as supplementary information when analyzing user behavioral patterns and preferences, and contribute to improving the accuracy of the emotion engine and optimizing response generation functions. User-entered text information is also used as training data for emotion estimation models as part of the emotion analysis method.
The system has the ability to analyze the user's emotions in real time during a call and automatically adjust the dialogue content, so it can integrate facial expression recognition technology in addition to voice data to capture even more subtle changes in emotions. It uses a webcam or a camera on a smart device to analyze the user's facial expressions and improve the accuracy of emotion analysis. This added facial expression recognition function will more accurately recognize the user's emotions and make it possible to adjust the dialogue according to even more subtle changes in emotions.
It is also conceivable to incorporate wearable devices to capture the user's physiological signals. Measuring physiological responses such as heart rate and electrodermal activity can provide deeper insight into emotions that cannot be captured by tone of voice or facial expressions alone. Integrating this data can further improve the accuracy of emotion analysis and generate more appropriate dialogue responses.
It would also be effective to add customization features to the dialogue adjustment method that take into account the user's cultural background and personal values. A user profile can be constructed and the tone and content of the dialogue can be further personalized based on that information. This makes it possible to provide detailed support tailored to each individual user.
Furthermore, in order to promote the evolution of the emotion estimation model, we will build a mechanism to collect emotion data through crowdsourcing and incorporate feedback from a variety of users to continuously update the model, resulting in a flexible system that can handle a variety of emotional expressions and languages.
Applications in the fields of education and mental health care can also be considered. For example, in the field of education, it could be used to present educational materials adapted to students' emotions or in counseling sessions. In mental health care, it could recognize the user's emotions and provide dialogue support to reduce stress and anxiety. This would enable a more effective approach to the problems the user is facing.
Regarding privacy protection for the system, we will strengthen security measures to safely store users' emotional data and minimize the risk of information leakage by using appropriate access control and encryption technology. This will allow users to use the system with peace of mind.
Ultimately, it is hoped that the personalized interaction experience provided by this system will be developed into services that improve the quality of users' lives.
The embodiment of the present invention is applicable not only to telephone calls but also to video conferencing and online education platforms. For example, a lecturer can grasp the emotions of students in real time and adjust the progress of the curriculum according to the emotions, thereby providing a more effective learning experience. Even in video conferencing, dialogue management that reflects the emotions of participants is performed, promoting a productive and positive meeting environment.
The system could also be enhanced to monitor health conditions based on emotional responses: for example, if a user's tone of voice or manner of speaking indicates negative emotions over a period of time, a mental health professional could be notified to intervene if necessary.
Furthermore, to further improve the emotion engine, a function will be incorporated that will analyze the emotional patterns of the user's daily life and provide advice on long-term emotion management and stress reduction based on that information. By analyzing the user's daily rhythm and activity patterns and predicting emotional ups and downs, the system will suggest content for relaxation and motivation at appropriate times.
This system is also expected to be applied in the field of customer support. For example, if call center operators could grasp the customer's emotions in real time and detect negative emotions such as dissatisfaction or anger, they could take immediate action to address the issue, contributing to improving customer satisfaction.
In the fields of games and entertainment, it is expected that dynamically changing content according to the user's emotions will have the effect of increasing immersion and enjoyment. In-game characters will react to the player's emotions, changing the story development and dialogue content, creating a more personalized experience.
In addition, we will integrate voice assistants and virtual reality (VR) to realize more natural dialogue based on the user's emotions. Voice assistants will understand the user's emotions and provide information and services tailored to individual needs. In the VR environment, the scenario and environment will change according to the user's emotions, providing an experience that matches the emotions in real time.
This system also has the potential to revolutionize the design of user interfaces (UI) and user experiences (UX). Using emotion recognition technology, it can provide UI and UX optimized for the user's emotions, increasing user satisfaction. For example, websites and applications can grasp the user's emotions in real time and adjust the way content is presented and the form of interaction.
Ultimately, it is hoped that the emotion regulation function provided by this system will become widely used throughout society as a tool to improve the quality of human relationships and increase the effectiveness of communication.

特定処理部２９０は、特定処理の結果をスマート眼鏡２１４に送信する。スマート眼鏡２１４では、制御部４６Ａが、スピーカ２４０に対して特定処理の結果を出力させる。マイクロフォン２３８は、特定処理の結果に対するユーザ入力を示す音声を取得する。制御部４６Ａは、マイクロフォン２３８によって取得されたユーザ入力を示す音声データをデータ処理装置１２に送信する。データ処理装置１２では、特定処理部２９０が音声データを取得する。 The specific processing unit 290 transmits the result of the specific processing to the smart glasses 214. In the smart glasses 214, the control unit 46A causes the speaker 240 to output the result of the specific processing. The microphone 238 acquires audio indicating a user input for the result of the specific processing. The control unit 46A transmits audio data indicating the user input acquired by the microphone 238 to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the audio data.

上記実施形態では、データ処理装置１２によって特定処理が行われる形態例を挙げたが、本開示の技術はこれに限定されず、スマート眼鏡２１４によって特定処理が行われるようにしてもよい。 In the above embodiment, an example was given in which the specific processing is performed by the data processing device 12, but the technology disclosed herein is not limited to this, and the specific processing may also be performed by the smart glasses 214.

［第３実施形態］ [Third embodiment]

図５には、第３実施形態に係るデータ処理システム３１０の構成の一例が示されている。 Figure 5 shows an example of the configuration of a data processing system 310 according to the third embodiment.

図５に示すように、データ処理システム３１０は、データ処理装置１２及びヘッドセット型端末３１４を備えている。データ処理装置１２の一例としては、サーバが挙げられる。 As shown in FIG. 5, the data processing system 310 includes a data processing device 12 and a headset terminal 314. An example of the data processing device 12 is a server.

ヘッドセット型端末３１４は、コンピュータ３６、マイクロフォン２３８、スピーカ２４０、カメラ４２、通信Ｉ／Ｆ４４、及びディスプレイ３４３を備えている。コンピュータ３６は、プロセッサ４６、ＲＡＭ４８、及びストレージ５０を備えている。プロセッサ４６、ＲＡＭ４８、及びストレージ５０は、バス５２に接続されている。また、マイクロフォン２３８、スピーカ２４０、カメラ４２、及びディスプレイ３４３も、バス５２に接続されている。 The headset type terminal 314 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication I/F 44, and a display 343. The computer 36 includes a processor 46, a RAM 48, and a storage 50. The processor 46, the RAM 48, and the storage 50 are connected to a bus 52. The microphone 238, the speaker 240, the camera 42, and the display 343 are also connected to the bus 52.

図６には、データ処理装置１２及びヘッドセット型端末３１４の要部機能の一例が示されている。図６に示すように、データ処理装置１２では、プロセッサ２８によって特定処理が行われる。ストレージ３２には、特定処理プログラム５６が格納されている。 Figure 6 shows an example of the main functions of the data processing device 12 and the headset type terminal 314. As shown in Figure 6, in the data processing device 12, a specific process is performed by the processor 28. A specific process program 56 is stored in the storage 32.

ヘッドセット型端末３１４では、プロセッサ４６によって受付出力処理が行われる。ストレージ５０には、受付出力プログラム６０が格納されている。プロセッサ４６は、ストレージ５０から受付出力プログラム６０を読み出し、読み出した受付出力プログラム６０をＲＡＭ４８上で実行する。受付出力処理は、プロセッサ４６がＲＡＭ４８上で実行する受付出力プログラム６０に従って、制御部４６Ａとして動作することによって実現される。 In the headset terminal 314, the reception output process is performed by the processor 46. A reception output program 60 is stored in the storage 50. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output process is realized by the processor 46 operating as the control unit 46A in accordance with the reception output program 60 executed on the RAM 48.

（形態例１）
本発明を実施するための形態は、未登録または非通知の電話番号からの着信時に、生成系AIが応答する応答手段と、通話終了後に用件をチャットGPTで文書化し、メッセンジャーアプリやメールで送信する送信手段とを具備するシステムである。さらに、家族や知人と名乗る場合には、専用の質問や合言葉を使用して確認する確認手段と、配送業者は業者用の合言葉を使用して本人につながる接続手段と、通話中には電話番号をチェックし、悪用歴のある番号の場合は警察に連携する連携手段とを具備する。
応答手段は、データ処理装置１２の特定処理部２９０によって実現され、未登録または非通知の電話番号からの着信時に生成系AIを用いて応答する。送信手段は、通話終了後に用件をチャットGPTで文書化し、メッセンジャーアプリやメールで送信する手段として、データ処理装置１２の特定処理部２９０によって実現される。確認手段は、専用の質問や合言葉を使用して家族や知人を確認する手段として、ヘッドセット型端末３１４の制御部４６Ａによって実現されてもよい。接続手段は、配送業者が業者用の合言葉を使用して本人につながる手段として、ヘッドセット型端末３１４の制御部４６Ａによって実現されてもよい。連携手段は、通話中に電話番号をチェックし、悪用歴のある番号の場合は警察に連携する手段として、データ処理装置１２の特定処理部２９０によって実現される。各手段と装置や制御部との対応関係は、上述した例に限定されず、種々の変更が可能である。
応答手段は、音声認識技術を活用した発話理解機能が含まれ、未登録または非通知の番号からの着信に対して、事前に訓練されたAIが生成した自然言語による応答を行う。このAIは、大量の通話データを学習しており、さまざまなシナリオに対応できる回答を生成する能力を有する。着信があると、AIはリアルタイムで着信内容を解析し、適切な応答を選択して会話を進める。応答内容は、通話の性質に合わせてカスタマイズされるため、営業電話からの質問には丁寧に対応し、不審な着信に対しては慎重に応答する。また、応答手段は、未登録または非通知の電話番号からの着信に対応するため、音声合成技術と発話内容理解技術を駆使し、事前に訓練を受けたAIが適切な対応を行うこともできる。このAIは複数の応答シナリオを学習し、通話の意図を把握して、状況に応じた自然な対話を提供することができる。着信内容に基づいてAIが瞬時に応答を選択し、通話の趣旨に合わせてカスタマイズされた対応を行うため、営業電話の問い合わせには適切に対処し、怪しい着信には慎重に応じることができる。
送信手段には、通話内容をテキスト化する音声認識技術と、そのテキストを基に通話の要約を生成する自然言語生成技術が含まれる。通話が終了すると、AIは通話内容をテキストデータに変換し、その内容を要約して文書化する。生成された文書は、メッセンジャーアプリやメールなどのコミュニケーション手段を介して、ユーザーに送信される。このプロセスにより、ユーザーは通話の詳細を後からでも確認でき、重要な情報を見逃すことがない。また、送信手段では、終了した通話の内容を音声認識技術によりテキストに変換し、その後、自然言語処理技術を用いて要約を行い、生成された文書をメッセンジャーアプリやメールでユーザーに送付することもできる。ユーザーは送付された文書を介して通話の詳細をいつでも確認可能で、重要な情報が記録され、見落とされることがなくなる。
確認手段は、家族や知人からの着信と名乗る場合に、特定の質問や合言葉を使用して本人確認を行う機能を有する。この手段は、ユーザーが設定した質問や合言葉をAIが使用し、着信者が正しい応答を行うことで本人であることを確認する。本人確認が成功すると、AIは通話を継続し、必要に応じてユーザーに転送する。このプロセスにより、不正アクセスや詐欺を防ぎながら、本人であれば円滑な通信を確保する。また、確認手段は、家族や知人を名乗る着信者に対して、ユーザーが設定した質問や合言葉をAIが提示し、正確な回答が得られた場合にのみ、通話を継続またはユーザーに転送することもできる。この機能により、不正なアクセスや詐欺を未然に防ぎつつ、本人であることが確認された場合にはスムーズなコミュニケーションを実現する。
接続手段は、配送業者などの特定の職種の者が業者用の合言葉を使用することで本人と通話ができる機能を有する。この手段では、AIが業者用の合言葉を受け取り、正しい合言葉であることを確認した後、通話を本人に繋ぐ。業者用の合言葉は、通話のセキュリティを確保するために重要であり、誤った合言葉が入力された場合、通話は繋がらない。また、接続手段は、配送業者などの職種固有の合言葉をAIが受け取り、その正当性を確認した上で、通話をユーザーにつなぐ機能を担うこともできる。この合言葉によって、通話のセキュリティが確保され、誤った合言葉の場合には通話が切断される仕組みとなっている。
連携手段は、通話中に電話番号をリアルタイムでチェックし、その番号に悪用歴がある場合は自動的に警察や関連機関に連携する機能を有する。この手段は、データベースに蓄積された不審な番号のリストを参照し、着信番号がそのリストに含まれているかを確認する。不審な番号からの通話であると判断された場合、システムは即座に警察への報告プロトコルを開始し、状況に応じて適切な対応を取る。また、連携手段は、通話中の番号をリアルタイムで監視し、過去に悪用された履歴がある番号であれば自動的に警察などの関連機関と連携する機能を果たすこともできる。不審な番号の検出時には、警察への通報プロトコルが起動し、迅速な対応が取られる。
これらの手段は、ユーザーのセキュリティと利便性を高めるために、複数のAI技術と連携機能を組み合わせて実装される。データ処理装置とスマートデバイスの制御部が協力し、これらの手段を柔軟に実現するためのプラットフォームが提供される。各手段は、独立して機能するだけでなく、連動してより高度なサービスを提供するために設計される。ユーザーは、これらの手段を通じて、未登録または非通知の電話番号からの着信に対する対応を自動化し、日常生活の中で発生する潜在的なリスクから自己を守ることができる。また、これらの手段は、AI技術を中心に構築され、ユーザーのセキュリティを向上させると同時に、日常のコミュニケーションを効率化する。センサーを使用しないデータ収集の代替手段として、ユーザーが手動で情報を入力し、システムがその情報を利用するプロセスが可能である。ユーザーはこれらの手段によって、未登録または非通知の着信に対する対処を自動化し、不測の事態からの保護を強化することができる。
本形態例のシステムには、さらなる機能を追加し、その利便性を高めることができる。例えば、応答手段には、会話中にユーザーの感情を解析する機能を追加し、通話相手の声のトーンや話し方から感情を推定し、それに応じた応答をAIが行う。これにより、通話相手が怒りや不安を感じている場合は、より慎重かつ共感的な応答が可能となり、通話の品質を向上させる。
送信手段には、文書化した通話内容をユーザーのカレンダーやリマインダーに自動的に統合する機能を追加する。これにより、通話で得たアポイントメントやタスクを忘れずに管理できるようになり、生産性の向上を図ることができる。
確認手段は、ユーザーの生体情報を利用して本人確認を行う機能を備える。例えば、スマートウォッチやフィットネストラッカーから得た生体情報を用いて、通話相手が実際に家族や知人であることを確認する。これにより、より高度なセキュリティを実現しつつ、利用者の手間を減らすことができる。
接続手段には、配送業者がQRコード（登録商標）またはNFCタグをスキャンすることで認証を行い、通話を本人に直接繋ぐ機能を追加する。これにより、合言葉のやり取りを省略し、より迅速かつ安全に配送業者との連携を図ることが可能となる。
連携手段では、通話中の音声データから不審なキーワードを検出し、その内容を自動で分析する機能を追加する。キーワードに基づいて不審な通話と判断された場合、警察への通報を行う前に、まずはユーザーに警告することで、より正確な判断をサポートする。
これらの追加機能により、本形態例のシステムは、通話の自動応答だけでなく、日々の生活の中でのセキュリティやスケジュール管理をより効率的にサポートする。また、各種のAI技術とデータベースの連携によって、ユーザーの生活に更なる安心と便利さを提供する。さらに、これらの技術をユーザーのスマートデバイスや家庭内のIoT機器と統合することで、よりパーソナライズされた体験を実現することが可能となる。
本形態例のシステムには、さらなる機能拡張が可能であり、利便性およびセキュリティを高めるために応答手段には、発信者の声紋を分析し、登録済みの家族や知人の声と照合する機能を追加することができる。これにより、発信者が本人であるかをより正確に判断し、特定の質問や合言葉を使用しなくても安全に通話を継続することが可能になる。
送信手段では、音声認識と自然言語処理を活用して得られた通話内容を、ユーザーが選択した複数の言語に翻訳し、異なる言語を話すユーザー間のコミュニケーションを支援する機能を組み入れることができる。これにより、国際的なビジネスシーンや多言語を話す家庭内での利用が促進される。
確認手段には、ユーザーが特定のジェスチャーや動作をカメラに示すことで本人確認を行う機能を追加することができる。この機能により、通話中に手軽にかつ迅速に本人確認を行うことが可能となり、セキュリティが一層強化される。
接続手段に関しては、AIが発信者の位置情報と配送予定情報を照合し、本人確認を行う機能を追加することで、配送業者が実際に商品を配送している場所から通話していることを確認し、通話の真正性を保証することが可能になる。
連携手段では、不審な通話が検出された際に、警察だけでなく、ユーザーの指定した緊急連絡先にも自動通報する機能を追加することで、万が一の緊急時に迅速な対応が行えるようになる。
これらの機能追加は、既存のシステムの基本的な構造を維持しつつ、ユーザーのニーズに合わせて柔軟な対応が可能となる。また、スマートデバイスや家庭内のIoT機器と連携することで、ユーザーが日常的に使用する環境においてもシームレスな経験を提供する。これらの追加機能により、本形態例のシステムは通話の自動応答を超え、日常生活のあらゆる面でユーザーのセキュリティと利便性を向上させる。
（形態例２）
本発明を実施するための形態は、通話終了後に用件をチャットGPTで文書化し、メッセンジャーアプリやメールで送信する際に、家族宛に送信可能な送信手段を具備するシステムである。具体的には、通話終了後に生成された文書を選択し、送信先を家族と指定することで、家族宛に用件を送信することができる。
送信手段は、通話終了後に用件をチャットGPTで文書化し、メッセンジャーアプリやメールで家族宛に送信する手段として、データ処理装置１２の特定処理部２９０によって実現される。また、送信手段は、ヘッドセット型端末３１４の制御部４６Ａによって実現されてもよい。各手段と装置や制御部との対応関係は、上述した例に限定されず、種々の変更が可能である。
通話終了後の内容を文書化する手段は、音声認識技術を用いた会話内容解析手段を含む。この解析手段は、通話の音声データをテキストに変換し、通話の要点を抽出する。また、音声認識技術を用いた会話内容解析手段は、通話が終了した後に発生する音声データをテキストに変換し、そのテキストデータから通話の主要なポイントを抽出する機能を持つ。音声認識技術は、ディープラーニングに基づくモデルを活用し、さまざまな言語やアクセントに対応するためのトレーニングが行われる。また、この解析手段は、多様な言語やアクセントに対応するためにディープラーニングモデルを活用し、大量の音声データを基に学習を進める。会話内容解析手段は、形態素解析や構文解析を行い、会話の中で交わされた重要な情報や行動を要する内容を特定する。特に、キーワード抽出機能を用いて会話の中で頻繁に使われる単語やフレーズを特定し、それらを基に通話の要点を文書化する。また、会話の要点を明確にするために、形態素解析機能と構文解析機能を組み合わせ、会話中に交わされた重要な情報やアクションを要する内容を特定する。キーワード抽出機能は、会話の中で頻出する単語やフレーズを識別し、それらの情報を基に文書化する。
文書化された内容をメッセージとしてフォーマットする手段は、テキストエディタ機能を含む。この機能は、解析されたテキストを整理し、文書の構造を整える。メッセージフォーマット手段は、ユーザが容易に内容を確認し、必要に応じて編集や追加情報を加えることができるインタフェースを提供する。また、文書化された内容をメッセージ形式に整える手段には、テキストエディタ機能が含まれ、解析されたテキストを整理し、文書のレイアウトを整える。ユーザがメッセージの内容を確認し、編集や追加情報を加えることができるインターフェイスを提供する。
送信手段は、メッセージを宛先指定機能を用いて特定の家族に送信する。この宛先指定機能は、ユーザの連絡先リストと連携し、選択された家族メンバーの連絡先情報に基づいてメッセージを自動的に送信する。また、送信手段は、メッセージ送信の確認と送信履歴を管理する機能も備えており、ユーザは送信されたメッセージの状態を追跡し、必要に応じて再送信を行うことができる。メッセージの送信は、メッセンジャーアプリやメールアプリとの連携によって行われる。メッセンジャーアプリやメールアプリとの連携機能は、ユーザのアカウント情報を用いて認証を行い、安全にメッセージを送信する。また、送信手段は、セキュリティ対策としてメッセージの暗号化や送信時の認証プロセスを実施し、プライバシーの保護を確保する。また、送信手段は、ユーザが選択したメッセージ送信方法に応じて、メッセージを適切なフォーマットで送信する。例えば、メッセンジャーアプリではインスタントメッセージとして、メールアプリでは電子メールとして送信する。メッセージの送信手段は、ユーザが選択した家族メンバーに対してメッセージを送るための宛先指定機能を備えており、ユーザの連絡先リストと連携して、家族メンバーの連絡先に基づいたメッセージの自動送信を行う。送信手段には、メッセージの送信状況を確認し、送信履歴を管理する機能も含まれており、ユーザは送信したメッセージの状態を追跡し、必要に応じて再送信を行うことができる。メッセージの送信プロセスは、メッセンジャーアプリやメールアプリと連携し、ユーザのアカウント情報を基に認証を行い、安全にメッセージを送信する機能を持つ。送信手段は、メッセージの内容を暗号化し、送信時の認証を行うセキュリティ対策を施して、プライバシーを守る。また、メッセンジャーアプリではインスタントメッセージとして、メールアプリでは電子メールとして、ユーザが選択した送信方法に応じて適切なフォーマットでメッセージを送信する。
センサーを含まないデータ収集の例としては、ユーザが手動で通話内容をメモする場合が考えられる。この場合、ユーザは通話終了後に自分で通話の要点を記録し、そのテキストデータをメッセージとして家族に送信する。ユーザが手動で記録した通話内容は、テキストエディタ機能を用いて整理され、フォーマットされたメッセージとして送信手段によって家族宛に送信される。この手動記録は、音声認識技術を用いた自動文書化が適用できない状況や、ユーザが特定の情報を自らの言葉で伝えたい場合に適している。また、センサーを用いないデータ収集の方法として、ユーザが通話終了後に手動で通話内容をメモし、その情報をメッセージとして家族に送るシナリオも考えられる。この手動記録は、テキストエディタ機能を用いて整理され、フォーマットされたメッセージとして送信される。音声認識技術が適用できない状況や、ユーザが自らの言葉で特定の情報を伝えたい場合に利用される。
本発明の実施形態では、通話内容の文書化を超えた機能を考慮することができる。例えば、音声認識によって生成されたテキストに基づき、スケジュール管理システムと連携して、通話中に言及されたアポイントメントや予定を自動的にカレンダーに登録する機能を追加する。これにより、ユーザーは通話後に手動でスケジュールを管理する手間を省くことができる。さらに、家族間で共有されるカレンダーへの予定登録を提案し、家族全員が予定を共有しやすくする。
また、文書化されたテキストデータを基に、自動的にタスクリストを生成し、家族全員がアクセスできる共有プラットフォームに投稿する機能を設けることも可能である。このプラットフォームでは、各家族メンバーがタスクの進捗を更新したり、完了したタスクにチェックを入れることができ、家族全員で情報を共有し協力する環境を構築する。
さらに、文書化されたメッセージに対して、感情分析を行い、通話中の感情的なニュアンスをテキストに反映させる機能を追加することで、メッセージの受取人が発信者の意図をより正確に理解することを助ける。例えば、通話中に喜びや心配といった感情が表れた場合、その感情をテキストに特定の絵文字やフォーマットで表現し、コミュニケーションの豊かさを高める。
また、音声認識と解析を活用して、通話内容から自動的にFAQやよくある質問リストを生成し、家族が同様の問い合わせをする際に参照できる知識ベースを構築する機能も考えられる。この知識ベースは、家族内で共有され、新たな通話が発生するたびに更新されることで、家族間のコミュニケーションの効率を向上させる。
さらに、通話終了後に生成されるテキストは、メッセンジャーアプリやメールで送信するだけでなく、音声形式で再生する機能を付加することで、視覚障害のある家族メンバーや読み書きが苦手な子供でも情報を容易に受け取れるようにする。
最後に、通話終了後に文書化された内容を、家族メンバーのプライバシーを保護するために、文書内の機微な情報を識別し、自動的に匿名化や伏せ字処理を行う機能を組み込むことで、安心して情報を共有できる環境を提供する。これにより、個々のプライバシーを尊重しつつ、必要な情報のみを共有するバランスを保つことができる。
本発明の実施形態は、通話内容の自動文書化と送信に関するものであり、これに新たな機能を追加することが考えられる。例えば、通話内容を分析し、通話終了後に自動でアクションアイテムを生成し、対応が必要なタスクとしてユーザーのスマートデバイスにリマインダーをセットする機能が考えられる。このリマインダーは、通話で言及された期限や重要性に基づいて優先度を設定し、ユーザーが忘れずに行動に移せるようサポートする。
さらに、家族間でのコミュニケーションを強化するために、文書化されたメッセージ内の特定の単語やフレーズに基づいて、関連する画像やビデオ、リンクを自動的に添付する機能を追加することも有益である。これにより、テキストベースのメッセージだけでなく、視覚的な情報も共有でき、コミュニケーションがより豊かになる。
また、通話内容の文書化に際して、プライバシーに配慮し、特定の個人情報や機密情報を自動的に検出し、ブラー処理や伏せ字に変換する機能を実装することで、セキュリティを高めることができる。このプロセスは、自然言語処理技術とプライバシー保護のガイドラインに従って行われる。
通話内容のテキスト化では、ユーザーの多様なニーズに対応するため、複数の言語への翻訳機能を組み込むことも有効である。家族が異なる言語を話す多文化の環境では、通話内容を自動的に翻訳し、各メンバーが理解しやすい言語でメッセージを送信することが可能となる。
さらなる利便性を追求するために、メッセージの送信タイミングをユーザーがカスタマイズできるスケジュール機能を追加する。ユーザーは、即時送信だけでなく、特定の日時にメッセージを送信するよう設定できるため、家族が情報を受け取るタイミングを最適化できる。
最後に、メッセージの受け取り側で、受信したメッセージに対するアクションを簡単にとれるよう、返信や確認のためのクイックアクションボタンを設けることで、迅速なフィードバックと効率的なコミュニケーションを実現する。これにより、家族間での情報共有がさらにスムーズに行われるようになる。
（形態例３）
本発明を実施するための形態は、通話中に電話番号をチェックし、悪用歴のある番号の場合には警察に連携する際に、警察との連携手段を具備するシステムである。具体的には、通話中に着信番号をデータベースと照合し、悪用歴のある番号であることを検出した場合には、自動的に警察に通報する機能を備えている。
連携手段は、通話中に電話番号をチェックし、悪用歴のある番号の場合には警察に連携する手段として、データ処理装置１２の特定処理部２９０によって実現される。また、連携手段は、ヘッドセット型端末３１４の制御部４６Ａによって実現されてもよい。各手段と装置や制御部との対応関係は、上述した例に限定されず、種々の変更が可能である。
抽出手段は、通信網を介して発信される各通話の信号に含まれる発信者情報を抽出するために備わっており、デジタル信号処理技術を用いて通話データから発信者の電話番号を正確に取得することができる。また、抽出手段は、Caller ID情報を解析し、電話番号を特定するための信号解析機能を持っている。このシステムは、通話信号の中から発信者の電話番号を抽出するための信号抽出機能と、Caller ID情報の解読に特化した解析アルゴリズムを利用して、着信番号を正確に特定することもできる。
照合手段は、抽出された電話番号を不正利用が疑われる電話番号を収集し、カテゴリ別に整理したデータベースに照合する機能を有しており、データベース管理システムを通じてリアルタイムでの照合処理が可能であり、高速な検索アルゴリズムとインデックス技術により、通話が進行している間に迅速な照合が行われる。また、照合手段は、悪用歴のある番号をリスト化したデータベースを参照し、迅速な検索と照合を行うための高性能なデータベース検索機能とインデックス機能を備えている。
連携手段は、悪用歴のある番号が検出された場合に自動的に警察に通報する機能を持ち、通報システムとのインターフェイス機能が含まれ、通報する際の警察の受付システムとのプロトコルに基づいたデータ形式で通報情報を生成し、安全な通信チャネルを用いて警察の受付システムに送信する。通報情報の内容には、検出された悪用歴のある電話番号、通話の日時、通話の持続時間、発信者が利用している通信事業者などの情報が含まれ、個人情報の保護や通報の正確性を確保するために、暗号化技術や認証システムが用いられる。また、連携手段は、複数の通信プロトコルや通報システムとの互換性を持ち、システム間のデータ交換を円滑に行うためのアダプター機能を設け、通報のプロセスにおいて、警察の受付システムの要件に合わせて通報情報のフォーマットを調整し、適切な通報プロトコルを選択して通報を行う機能を持つ。通報プロセスが発動されると、システムは警察の受付システムに対して、通報情報を送信し、通報の受付確認を取得する。この確認は、通報が正しく行われたことをシステムが記録し、通報履歴として保存するための情報として利用される。通報履歴は、将来的な分析や改善のために用いられ、通報プロセスの効率化や精度向上に寄与する。この連携機能は、通報データ生成機能により、必要な情報を含む通報フォーマットを作成し、セキュアな通報送信機能を介して警察の受付システムへ情報を送信する。通報のセキュリティと信頼性を保証するため、データの暗号化機能とシステム認証機能が導入されている。また、システムの連携手段は、複数の通報プロトコルと互換性を持ち、異なる通報システムとのデータ交換を実現するアダプター機能を有しており、このアダプター機能は、通報プロトコル選択機能により、通報時のプロトコル要件に適した形式に自動的に調整し、通報情報の送信と受付確認を行う。通報履歴記録機能は通報の成功を記録し、システムのパフォーマンス分析や改善に使用される。
データ収集手段には、センサーを用いない例として、ユーザがアプリケーションやウェブインターフェース上で疑わしい通話に関する報告を行う機能があり、ユーザは、通話の経験や通話中に感じた不審な点をフォームに入力し、その情報がデータベースに登録される。この手動報告により収集されたデータは、自動的にデータベースに照合される電話番号のリストに追加される可能性があり、悪用歴のある番号の検出精度の向上に寄与する。また、センサーを使用しないデータ収集例として、ユーザが疑わしい通話について報告するための入力機能が提供される。ユーザはインタラクティブな報告フォームを通じて、疑わしい通話の内容をデータベースに登録し、これにより収集された情報は悪用歴のある番号の検出に活用される。この手動報告システムは、ユーザの経験と感覚に基づいて追加データを提供し、番号照合データベースの拡張に貢献する。
このシステムには、データベースの更新メカニズムを強化する機能を追加することができる。例えば、新たに悪用が確認された番号は、通報後も自動的にデータベースに追加される。さらに、疑わしい通話が報告された際には、その番号の信用情報を他のデータベースとも照合し、ユーザーからの報告に基づく情報と組み合わせることで、より正確な悪用歴の特定を実現する。データベースの整合性を保つために、定期的なクリーニングプロセスを実行し、誤った情報や古いデータを排除する仕組みも設けられる。また、通報システムとの連携を強化するため、警察が提供する犯罪データベースと直接連携し、照合プロセス中にリアルタイムで犯罪情報を取得し、照合結果の精度を向上させる機能も導入される。
通報の即時性を高めるために、通話が開始された瞬間に照合プロセスが開始され、悪用歴のある番号が検出された場合、通話者に警告音を出すか、自動的に通話を遮断するオプションも設けられる。さらに、通話を遮断した際には、通報者に代わって通話内容の録音を保存し、警察の調査に役立てることができる。警察が介入する際には、通報者の位置情報や通話履歴を含む詳細なレポートが自動生成され、犯罪捜査の迅速化を支援する。
ユーザインターフェースには、通報システムの透明性を高めるために、通報プロセスの進行状況をリアルタイムで確認できる機能が追加される。通報の結果や警察からのフィードバックをユーザが確認できるようにすることで、システムへの信頼性を向上させる。また、悪用歴のある番号に関する統計データやトレンド分析を提供し、ユーザが通話に対する警戒心を持つための情報提供も行われる。
さらに、悪用歴のある番号を特定するための機械学習技術を導入し、通話パターンや通話の頻度などの様々な指標を分析することで、悪用の可能性が高い新たな番号を予測する。これにより、データベースの予防的な更新が可能となり、未知の犯罪行為を防ぐための対策を強化する。また、ユーザが通報システムの効果について直接フィードバックを提供できる機能を設け、システムの改善に役立てる。フィードバックは匿名で行われることで、ユーザのプライバシーを保護しつつ、システムの改善に資する貴重な情報を収集する。
通話中の電話番号チェックをより効果的にするためには、ユーザーが直面する可能性のある様々な詐欺のパターンをAIが学習し、特定の単語やフレーズが通話中に検出された際にリアルタイムでフラグを立てる機能を実装する。これにより、単に電話番号がデータベース内の悪用歴と一致するだけでなく、通話の内容からも悪意を持った行動を推測し、検出することが可能になる。また、ユーザーが詐欺を疑う通話を簡単に報告できるショートカットやボタンをスマートフォンのインターフェースに設け、報告プロセスを簡略化する。これにより、データベースはより迅速に更新され、他のユーザーに対する保護が向上する。
警察との連携を強化するためには、通報された情報を基に警察が迅速に対応できるよう、通報システムに位置情報追跡機能を統合し、犯罪者の追跡と捕捉を支援する。また、通報システムに組み込まれる人工知能は、通報データから犯罪パターンを分析し、予防的な警戒活動を計画するための情報を警察に提供する。このような予測分析を活用することで、将来的な犯罪を未然に防ぐことに繋がる。
警察とのデータ共有を促進するために、警察が把握している詐欺事件やその他の犯罪に関する情報をリアルタイムで受け取り、データベースを更新する機能を設ける。これにより、通報システムは最新の犯罪情報に基づいて機能し、ユーザーを守るための対策が強化される。さらに、システムにはブロックリスト機能を追加し、ユーザーが自身で疑わしい番号を登録して通話を拒否できるようにする。これにより、ユーザー自身が直接リスクをコントロールすることが可能になる。
教育プログラムとして、ユーザーが詐欺の手口を認識し、予防するための情報を提供するオンライン講座やワークショップを開催する。これにより、ユーザーは自分自身を守るための知識を得ることができ、社会全体のセキュリティ意識が向上する。また、通報システムの利用によって防がれた詐欺事件の事例を共有し、ユーザーがシステムの実効性を理解しやすくする。
最後に、システムのアップデートを通じて、通話が詐欺である可能性が高いと判断された場合に、ユーザーに自動的に警告メッセージを送信し、詐欺に対する警戒を促す機能を実装する。これにより、ユーザーは即座に詐欺である可能性を認識し、適切な対応を取ることができるようになる。 (Example 1)
The embodiment of the present invention is a system that includes a response means in which a generative AI responds to an incoming call from an unregistered or withheld phone number, and a sending means that documents the matter in chat GPT after the call ends and sends it via a messenger app or email. In addition, the system includes a confirmation means that uses a special question or password to confirm if the caller claims to be a family member or acquaintance, a connection means that connects the delivery company to the person using a password for the delivery company, and a linking means that checks the phone number during the call and links to the police if the number has a history of abuse.
The response means is realized by the specific processing unit 290 of the data processing device 12, and responds using a generative AI when a call is received from an unregistered or unnotified phone number. The transmission means is realized by the specific processing unit 290 of the data processing device 12 as a means for documenting the matter in chat GPT after the call is ended and sending it by a messenger app or email. The confirmation means may be realized by the control unit 46A of the headset type terminal 314 as a means for confirming family members or acquaintances using dedicated questions or passwords. The connection means may be realized by the control unit 46A of the headset type terminal 314 as a means for a delivery company to connect to the person using a password for the delivery company. The linking means is realized by the specific processing unit 290 of the data processing device 12 as a means for checking the phone number during a call and linking to the police if the number has a history of misuse. The correspondence between each means and the device or control unit is not limited to the above example, and various changes are possible.
The response means includes a speech understanding function that utilizes voice recognition technology, and responds to calls from unregistered or withheld numbers in natural language generated by a pre-trained AI. This AI has learned a large amount of call data and has the ability to generate responses that can respond to various scenarios. When a call is received, the AI analyzes the content of the call in real time and selects an appropriate response to proceed with the conversation. The response content is customized to the nature of the call, so questions from sales calls are answered politely and suspicious calls are answered carefully. In addition, the response means can also use speech synthesis technology and speech content understanding technology to respond to calls from unregistered or withheld phone numbers, and a pre-trained AI can respond appropriately. This AI can learn multiple response scenarios, understand the intention of the call, and provide a natural dialogue according to the situation. The AI instantly selects a response based on the content of the call and responds customized to the purpose of the call, so inquiries from sales calls can be handled appropriately and suspicious calls can be answered carefully.
The transmission means includes voice recognition technology that converts the contents of the call into text and natural language generation technology that generates a summary of the call based on that text. When the call ends, the AI converts the contents of the call into text data and summarizes and documents the contents. The generated document is sent to the user via a communication means such as a messenger app or email. This process allows the user to check the details of the call at a later date and ensures that important information is not overlooked. The transmission means can also convert the contents of the completed call into text using voice recognition technology, then summarize it using natural language processing technology, and send the generated document to the user via a messenger app or email. The user can check the details of the call at any time through the sent document, and important information is recorded and will not be overlooked.
The verification means has the function of verifying the identity of the caller by using specific questions or passwords when the caller claims to be a family member or acquaintance. In this method, the AI uses questions and passwords set by the user, and verifies the identity of the caller by providing the correct response. If identity verification is successful, the AI continues the call and transfers the call to the user if necessary. This process prevents unauthorized access and fraud while ensuring smooth communication if the caller is the real person. The verification means can also have the AI present questions and passwords set by the user to callers claiming to be family members or acquaintances, and continue the call or transfer the call to the user only if an accurate answer is given. This function prevents unauthorized access and fraud, while enabling smooth communication if the caller is confirmed to be the real person.
The connection means has a function that allows a person in a specific occupation, such as a delivery worker, to talk to the person by using a secret code for the occupation. With this method, AI receives the secret code for the occupation, verifies that it is the correct code, and then connects the call to the person. The secret code for the occupation is important for ensuring the security of the call, and if an incorrect code is entered, the call will not be connected. The connection means can also have a function where AI receives a secret code specific to the occupation, such as a delivery worker, verifies its validity, and then connects the call to the user. The security of the call is ensured by this secret code, and if the incorrect code is entered, the call will be disconnected.
The linking means has a function of checking the phone number in real time during a call, and automatically linking to the police or related organizations if the number has a history of misuse. This means refers to a list of suspicious numbers stored in a database and checks whether the incoming number is included in the list. If it is determined that the call is from a suspicious number, the system immediately starts a reporting protocol to the police and takes appropriate action depending on the situation. The linking means can also monitor the number being called in real time, and automatically link to related organizations such as the police if the number has a history of misuse in the past. When a suspicious number is detected, a reporting protocol to the police is activated and a prompt response is taken.
These measures are implemented by combining multiple AI technologies and collaboration functions to enhance user security and convenience. The data processing device and the control unit of the smart device cooperate to provide a platform for flexibly realizing these measures. Each measure is designed not only to function independently but also to work together to provide more advanced services. Through these measures, users can automate responses to calls from unregistered or unnotified phone numbers and protect themselves from potential risks that occur in daily life. In addition, these measures are built around AI technology to improve user security while simultaneously streamlining daily communication. As an alternative to sensor-free data collection, a process in which users manually enter information and the system uses that information is possible. Through these measures, users can automate responses to unregistered or unnotified calls and strengthen protection from unforeseen events.
Further functions can be added to the system of this embodiment to enhance its convenience. For example, the response means can be added with a function to analyze the user's emotions during a conversation, and the AI can infer the emotions from the tone of the other party's voice and manner of speaking and respond accordingly. This allows for a more careful and empathetic response when the other party is feeling angry or anxious, improving the quality of the call.
The sending method will add a feature that automatically integrates documented calls into users' calendars and reminders, helping them remember appointments and tasks generated during calls and improving productivity.
The verification method has a function to verify the identity of the user using biometric information. For example, biometric information obtained from a smartwatch or fitness tracker can be used to verify that the person on the other end of the line is actually a family member or acquaintance. This reduces the hassle for users while achieving a higher level of security.
The connection method will add a function that allows the delivery company to scan a QR code (registered trademark) or NFC tag to authenticate the user and connect the call directly to the user, eliminating the need for a password and making it possible to communicate with the delivery company more quickly and safely.
The collaboration will add a function to detect suspicious keywords from voice data during phone calls and automatically analyze the content. If a call is deemed suspicious based on the keywords, the system will first warn the user before reporting it to the police, helping them make more accurate decisions.
With these additional functions, the system of this embodiment not only automatically answers calls, but also more efficiently supports security and schedule management in daily life. In addition, by linking various AI technologies with databases, it provides greater security and convenience to the user's life. Furthermore, by integrating these technologies with the user's smart devices and IoT devices in the home, it becomes possible to realize a more personalized experience.
The system of this embodiment can be further expanded, and in order to increase convenience and security, the response means can be added with a function to analyze the caller's voiceprint and compare it with the voices of registered family members and acquaintances. This makes it possible to more accurately determine whether the caller is the real person and to continue the call safely without using specific questions or passwords.
The transmission means can incorporate a function that uses voice recognition and natural language processing to translate the contents of the call into multiple languages selected by the user, facilitating communication between users who speak different languages, facilitating use in international business situations and within multilingual households.
The verification method can include a feature that allows users to verify their identity by making specific gestures or movements on the camera, making it possible to easily and quickly verify the identity of a user during a call, further enhancing security.
Regarding the means of connection, AI will be able to add a function to verify the identity of the caller by comparing their location information with delivery schedule information, making it possible to confirm that the delivery company is calling from a location where the goods are actually being delivered, thereby guaranteeing the authenticity of the call.
The collaboration method will add a function that automatically notifies not only the police but also the user's designated emergency contact when a suspicious call is detected, enabling a rapid response in the event of an emergency.
These additional functions allow for flexible response to user needs while maintaining the basic structure of the existing system. In addition, by linking with smart devices and IoT devices in the home, the system provides a seamless experience in the environment in which the user uses the system on a daily basis. With these additional functions, the system of this embodiment goes beyond automatic answering of calls, improving the security and convenience of users in all aspects of their daily lives.
(Example 2)
The embodiment of the present invention is a system that has a sending means that can send the matter to family members when documenting the matter in chat GPT after the call ends and sending it by messenger app or email. Specifically, the matter can be sent to family members by selecting the document created after the call ends and specifying "family members" as the destination.
The sending means is realized by the specific processing unit 290 of the data processing device 12 as a means for documenting the matter in chat GPT after the call ends and sending it to the family by messenger app or e-mail. The sending means may also be realized by the control unit 46A of the headset type terminal 314. The correspondence between each means and the device or control unit is not limited to the above-mentioned example, and various changes are possible.
The means for documenting the contents after the end of the call includes a conversation content analysis means using a voice recognition technology. This analysis means converts the voice data of the call into text and extracts the main points of the call. In addition, the conversation content analysis means using the voice recognition technology has a function of converting the voice data generated after the call is ended into text and extracting the main points of the call from the text data. The voice recognition technology utilizes a model based on deep learning and is trained to support various languages and accents. In addition, in order to support various languages and accents, this analysis means utilizes a deep learning model and proceeds with learning based on a large amount of voice data. The conversation content analysis means performs morphological analysis and syntactic analysis to identify important information exchanged in the conversation and content requiring action. In particular, a keyword extraction function is used to identify words and phrases frequently used in the conversation, and the main points of the call are documented based on these. In addition, in order to clarify the main points of the conversation, a morphological analysis function and a syntactic analysis function are combined to identify important information exchanged during the conversation and content requiring action. The keyword extraction function identifies words and phrases that appear frequently in the conversation, and documents based on this information.
The means for formatting the documented content into a message includes a text editor function, which organizes the parsed text and arranges the document structure. The message formatting means provides an interface that allows a user to easily check the content and add edits or additional information as necessary. The means for formatting the documented content into a message also includes a text editor function, which organizes the parsed text and arranges the document layout. The means for formatting the message provides an interface that allows a user to check the content of the message and add edits or additional information.
The sending means sends a message to a specific family member using a destination designation function. This destination designation function works in conjunction with the user's contact list to automatically send a message based on the contact information of the selected family member. The sending means also has a function of confirming message transmission and managing a transmission history, so that the user can track the status of the sent message and resend it as necessary. The message is sent in conjunction with a messenger app or an email app. The linking function with the messenger app or the email app performs authentication using the user's account information and transmits the message safely. The sending means also encrypts the message as a security measure and performs an authentication process at the time of transmission to ensure privacy protection. The sending means also transmits the message in an appropriate format depending on the message transmission method selected by the user. For example, the message is sent as an instant message in the messenger app, and as an email in the email app. The message sending means has a destination designation function for sending a message to a family member selected by the user, and works in conjunction with the user's contact list to automatically send a message based on the contact information of the family member. The sending means also includes a function for checking the sending status of a message and managing the sending history, so that the user can track the status of the sent message and resend it if necessary. The message sending process has a function for linking with a messenger application or an email application, authenticating the user based on the account information, and sending the message securely. The sending means protects privacy by encrypting the contents of the message and implementing security measures such as authentication at the time of sending. In addition, the message is sent in an appropriate format according to the sending method selected by the user, such as an instant message in the messenger application or an email in the email application.
An example of data collection that does not include sensors is when a user manually takes notes on a phone call. In this case, the user records the main points of the call after the call ends, and sends the text data to the family as a message. The call contents manually recorded by the user are organized using a text editor function and sent to the family as a formatted message by the sending means. This manual recording is suitable for situations where automatic documentation using voice recognition technology cannot be applied, or when the user wants to convey specific information in his or her own words. Another possible method of data collection that does not include sensors is a scenario in which a user manually takes notes on a phone call after the call ends, and sends the information to the family as a message. This manual recording is organized using a text editor function and sent as a formatted message. This is used in situations where voice recognition technology cannot be applied, or when the user wants to convey specific information in his or her own words.
In an embodiment of the present invention, functions beyond documenting the contents of a call can be considered. For example, a function can be added that automatically registers appointments and events mentioned during a call in a calendar in cooperation with a schedule management system based on the text generated by speech recognition. This can save the user the trouble of manually managing the schedule after the call. In addition, it can suggest adding events to a calendar shared among family members, making it easier for all family members to share the schedule.
It is also possible to automatically generate task lists based on documented text data and post them to a shared platform that can be accessed by all family members, where each family member can update the progress of tasks and check off completed tasks, creating an environment for the whole family to share information and cooperate.
In addition, the company will add a feature that performs sentiment analysis on written messages and reflects the emotional nuances expressed during a call in the text, helping message recipients to more accurately understand the caller's intent. For example, if emotions such as joy or anxiety are expressed during a call, those emotions will be expressed in the text with specific emojis and formats, enhancing the richness of communication.
Another possible function would be to use voice recognition and analysis to automatically generate FAQs and a list of frequently asked questions from the content of calls, building a knowledge base that family members can refer to when they have similar inquiries. This knowledge base would be shared among family members and updated every time a new call occurs, improving the efficiency of communication between family members.
In addition, the text generated after the call ends can be sent not only via messenger apps or email, but also played back in audio format, making it easier for visually impaired family members or children with difficulty reading and writing to receive the information.
Finally, to protect the privacy of family members, the documented content after the call is completed can be automatically anonymized or masked to identify sensitive information in the document, providing an environment in which information can be shared with peace of mind. This allows for a balance between respecting each individual's privacy and sharing only the information that is necessary.
The embodiment of the present invention relates to automatic documentation and transmission of call contents, and new functions can be added to the document. For example, the document can be analyzed, and action items can be automatically generated after the call ends, and a reminder can be set on the user's smart device as a task that needs to be addressed. The reminder can be prioritized based on deadlines and importance mentioned in the call, helping the user to take action without forgetting.
Additionally, to enhance communication among family members, it would be beneficial to add the ability to automatically attach relevant images, videos, and links based on specific words or phrases in a written message, allowing for visual information to be shared in addition to text-based messages, making communication richer.
In addition, the system can enhance security by automatically detecting and blurring certain personal or confidential information when documenting call transcripts, in accordance with privacy protection guidelines and natural language processing techniques.
When converting call contents into text, it is also effective to incorporate a translation function into multiple languages to meet the diverse needs of users. In a multicultural environment where family members speak different languages, it is possible to automatically translate the call contents and send messages in a language that each member can easily understand.
To further enhance convenience, a schedule function will be added that allows users to customize the timing of message sending. Users can set messages to be sent at a specific date and time, rather than just immediately, allowing them to optimize the timing at which their family members receive information.
Finally, quick action buttons for replying or confirming messages have been provided so that recipients can easily take action on received messages, enabling quick feedback and efficient communication, making information sharing among family members even smoother.
(Example 3)
The embodiment of the present invention is a system that checks phone numbers during a call, and has a means for linking with the police when linking with the police if the number has a history of misuse. Specifically, the system is equipped with a function that checks the incoming number against a database during a call, and automatically notifies the police if it detects that the number has a history of misuse.
The linking means is realized by the specific processing unit 290 of the data processing device 12 as a means for checking a telephone number during a call and linking with the police if the number has a history of misuse. The linking means may also be realized by the control unit 46A of the headset type terminal 314. The correspondence between each means and the device or control unit is not limited to the above-mentioned example, and various modifications are possible.
The extraction means is provided for extracting caller information contained in the signal of each call sent through the communication network, and can accurately obtain the caller's telephone number from the call data using digital signal processing technology. The extraction means also has a signal analysis function for analyzing Caller ID information and identifying the telephone number. The system can also accurately identify the called number by utilizing the signal extraction function for extracting the caller's telephone number from the call signal and an analysis algorithm specialized in decoding Caller ID information.
The matching means has a function of matching the extracted telephone number with a database that collects telephone numbers suspected of fraudulent use and organizes them by category, and the matching process can be performed in real time through a database management system, and rapid matching can be performed while the call is in progress using high-speed search algorithms and indexing technology.The matching means also has a high-performance database search function and indexing function for rapid search and matching by referring to a database that lists numbers with a history of misuse.
The linking means has a function of automatically reporting to the police when a number with a history of abuse is detected, includes an interface function with the reporting system, generates report information in a data format based on the protocol with the police reception system when reporting, and transmits it to the police reception system using a secure communication channel. The contents of the report information include information such as the detected abused phone number, the date and time of the call, the duration of the call, and the telecommunications carrier used by the caller, and uses encryption technology and authentication systems to protect personal information and ensure the accuracy of the report. In addition, the linking means is compatible with multiple communication protocols and reporting systems, has an adapter function for smooth data exchange between the systems, and has a function of adjusting the format of the report information to meet the requirements of the police reception system in the reporting process, and selecting an appropriate reporting protocol to make the report. When the reporting process is activated, the system transmits report information to the police reception system and obtains a report reception confirmation. This confirmation is used as information for the system to record that the report was made correctly and store it as a report history. The report history is used for future analysis and improvement, contributing to the efficiency and accuracy of the report process. This linking function uses a report data generation function to create a report format containing the necessary information, and transmits the information to the police reception system via a secure report transmission function. Data encryption and system authentication functions are implemented to ensure the security and reliability of reports. In addition, the system's linking means has an adapter function that is compatible with multiple report protocols and enables data exchange with different report systems, and this adapter function automatically adjusts to a format suitable for the protocol requirements at the time of reporting using a report protocol selection function, and transmits report information and confirms receipt. The report history recording function records the success of reports, which is used to analyze and improve system performance.
As an example of data collection without sensors, a function for users to report suspicious calls on an application or web interface is provided. The user enters their experience of the call and any suspicious points they noticed during the call into a form, and the information is registered in a database. The data collected through this manual report can be added to a list of phone numbers that are automatically matched in the database, contributing to improved accuracy in detecting numbers with a history of abuse. As an example of data collection without sensors, an input function is provided for users to report suspicious calls. Through an interactive reporting form, users register the contents of suspicious calls in a database, and the information collected is used to detect numbers with a history of abuse. This manual reporting system contributes to the expansion of the number matching database by providing additional data based on the user's experience and intuition.
The system can be equipped with a function to strengthen the database update mechanism. For example, newly confirmed abused numbers will be automatically added to the database even after they are reported. In addition, when a suspicious call is reported, the credit information of the number will be checked against other databases and combined with information based on user reports to more accurately identify abuse history. To ensure the integrity of the database, a regular cleaning process will be carried out to eliminate incorrect and outdated information. In addition, to strengthen cooperation with the reporting system, a function will be introduced to directly link with the crime database provided by the police, obtain crime information in real time during the matching process, and improve the accuracy of the matching results.
To improve the immediacy of reports, the matching process begins the moment the call is initiated, and if a number with a history of abuse is detected, the caller will be given the option to sound an alarm or automatically hang up. In addition, when the call is hung up, a recording of the call can be saved on the caller's behalf to assist police investigations. When police intervene, a detailed report including the caller's location and call history is automatically generated, helping to speed up criminal investigations.
The user interface will be updated to include a feature that allows users to check the progress of the reporting process in real time to increase transparency in the reporting system. Users will be able to check the results of their reports and feedback from the police, which will increase trust in the system. The system will also provide statistical data and trend analysis on abused numbers, providing information to users to be cautious about calls.
Furthermore, machine learning technology will be introduced to identify numbers with a history of abuse, and new numbers likely to be abused will be predicted by analyzing various indicators such as call patterns and call frequency. This will enable proactive updates to the database, strengthening measures to prevent unknown criminal activity. A function will also be added that allows users to provide direct feedback on the effectiveness of the reporting system, which will help improve the system. Feedback will be provided anonymously, protecting user privacy while collecting valuable information that will contribute to improving the system.
To make the phone number check during a call more effective, AI will learn the patterns of various fraudulent scams that users may face and implement a function to flag in real time when certain words or phrases are detected during a call. This will allow malicious behavior to be inferred and detected from the content of the call, rather than simply matching the phone number with abuse history in the database. In addition, the reporting process will be simplified by providing shortcuts and buttons on the smartphone interface that allow users to easily report calls that they suspect are fraudulent. This will allow the database to be updated more quickly and improve protection for other users.
To strengthen cooperation with the police, the reporting system will be integrated with location tracking capabilities to help police respond quickly based on reported information, helping them track and capture criminals. Artificial intelligence will also be built into the reporting system to analyze crime patterns from report data and provide police with information to plan preventive vigilance activities. Using such predictive analytics will help prevent future crimes.
To facilitate data sharing with the police, the system will receive real-time information on fraud cases and other crimes known to the police and update the database. This will ensure that the reporting system operates based on the latest crime information and strengthen measures to protect users. In addition, the system will be equipped with a block list function, allowing users to register suspicious numbers and block calls from them. This will allow users to directly control the risks themselves.
Educational programs will include online courses and workshops to provide users with information to recognize and prevent fraud methods. This will provide users with the knowledge to protect themselves and raise security awareness in society as a whole. Examples of fraud cases that were prevented by using the reporting system will also be shared to help users understand the effectiveness of the system.
Finally, through a system update, if a call is deemed likely to be fraudulent, a warning message will be automatically sent to users to warn them of fraud. This will allow users to immediately recognize the possibility of a fraud and take appropriate action.

（形態例１）
本発明を実施するための形態は、未登録または非通知の電話番号からの着信時に、生成系AIが応答する応答手段と、通話終了後に用件をチャットGPTで文書化し、メッセンジャーアプリやメールで送信する送信手段とを具備するシステムである。さらに、ユーザの感情を認識する感情エンジンを組み合わせる感情認識手段として、通話中にユーザの声のトーンや言葉の選択などを分析し、感情を推定する。推定された感情に基づいて、生成系AIの応答や文書化された用件の表現を調整する。例えば、ユーザが不安な感情を示している場合には、より穏やかな表現を使用することで安心感を与える。
応答手段は、データ処理装置１２の特定処理部２９０によって実現され、未登録または非通知の電話番号からの着信に対して生成系AIが応答する機能を提供する。送信手段は、通話終了後に用件をチャットGPTで文書化し、メッセンジャーアプリやメールで送信する機能として、データ処理装置１２の特定処理部２９０によって実現される。感情認識手段は、通話中にユーザの声のトーンや言葉の選択を分析し、感情を推定する機能として、ヘッドセット型端末３１４の制御部４６Ａによって実現されるが、データ処理装置１２の特定処理部２９０によって実現されてもよい。各手段と装置や制御部との対応関係は、上述した例に限定されず、種々の変更が可能である。
応答手段は、自然言語処理と音声合成を組み合わせた対話生成手段が含まれる。この対話生成手段は、未登録または非通知の着信に対して自動で応答を行い、ユーザとの対話を可能にする。未登録または非通知の着信を検出すると、生成系AIは事前に訓練された会話モデルを用いて応答する。この会話モデルは、様々なシナリオに対応できるように多様な対話データを基に学習しており、着信者の質問や要求に対して適切な回答や案内を提供する。また、応答手段は、AIによる対話生成機能を備え、未登録または非通知の電話番号からの着信に対して、自動的に適切な返答を行うことができる。この機能は、通話の初期段階で着信者の目的や要望を理解し、対応する応答を行うための対話管理機能と、生成された応答を自然な音声に変換することもできるための音声生成機能を組み合わせたものである。対話管理機能は、特定のキーワードやフレーズの検出に基づいて着信者の意図を分析し、適切な返答を生成することができる。音声生成機能は、テキストベースの応答をリアルタイムに音声に変換し、着信者に対して自然な会話体験を提供することができる。
送信手段には、チャットボットや自然言語理解技術を用いた文書化手段が含まれる。通話終了後、チャットGPTのような高度な自然言語理解モデルを使用して、通話内容を精確に文書化する。文書化手段は、通話の要点を抽出し、要約する能力を有しており、用件を簡潔かつ明瞭に伝える文書を生成することができる。生成された文書は、メッセンジャーアプリやメールを介してユーザに送信される。この過程には、ユーザのメールアドレスやメッセンジャーアカウントへの連携機能が含まれ、文書は適切な形式で自動的に送付される。また、送信手段は、通話内容をテキスト化し、これをユーザがアクセス可能な形で提供することができる。通話が終了すると、通話内容文書化機能が活動し、会話の主要なポイントを抽出し、要約することができる。この要約されたテキストは、自動配信機能を通じてユーザ指定のメールアドレスやメッセンジャーアプリに送信される。この自動配信機能には、文書を適切なフォーマットで整え、指定された送信先に確実に届けるためのメール送信機能やアプリ連携機能が含まれる。
感情認識手段には、音声分析を行う声紋解析手段と、言葉の選択から感情を推定する言語解析手段が含まれる。声紋解析手段は、スマートデバイスのマイクロフォンを利用してユーザの声のトーン、ピッチ、速度などの特徴を検出し、それらの声の特性からユーザの感情状態を推定することができる。言語解析手段は、ユーザの発言の内容を解析し、使用される言葉やフレーズから感情的なコンテキストを抽出することができる。これらの分析結果は、生成系AIが応答を行う際や、文書化された用件の表現を調整する際に使用される。例えば、ユーザが不安を示している場合、応答や文書の表現はより穏やかで安心感を与えるように調整される。また、感情認識手段は、通話中のユーザの声の特徴と言語使用から感情を推定する機能を有する。声紋分析機能は、マイクロフォンによって収集された音声データから、声の高低、強弱、速度などの特徴を抽出し、これらの特性を解析することで感情を推定することができる。言語感情分析機能は、通話中の言語データを処理し、使用される単語やフレーズが持つ感情的な意味を解析し、ユーザの感情状態を把握することができる。これらの分析結果は、AIが行う応答のトーンや、文書化された通話内容の表現を調整する際に利用され、ユーザに対して適切な感情的対応を提供することができる。
センサーを含まないデータ収集手段としては、ユーザが自身で入力するテキストデータや、システム利用に関するフィードバックが挙げられる。これらは、ユーザ入力受付機能やフィードバック収集機能を通じてシステムに提供され、サービス改善のための貴重な情報源として活用される。
これらの手段は、ユーザの要求に迅速かつ効果的に対応し、コミュニケーションの質を向上させることを目的としている。また、各手段の実装は、データ処理装置やスマートデバイスの制御部によって柔軟に行われ、システムの効率性とユーザビリティを高めるために様々な形で変更が可能となっている。
本発明のシステムは、追加機能として、未登録または非通知の着信に対して、応答前に通話者の意図を推測するための概要予測手段を備えることができる。これにより、応答手段がより精度の高い対話を生成し、ユーザーにとって有意義なやり取りが実現する。また、生成系AIが応答する際には、通話者の国や地域に基づいた言語選択機能を持たせ、多言語対応の自動応答が可能となる。
送信手段に関しては、通話内容の文書化に加えて、重要なキーワードやフレーズのハイライト機能を設けることで、ユーザーが文書を素早く把握できるようにする。さらに、文書化された内容に基づいて自動的にアクションアイテムを生成し、ユーザーのタスクリストに追加する機能を追加することができる。
感情認識手段においては、通話中にユーザーの感情が変化した場合、その変化をリアルタイムで検知し、応答手段の対話のトーンやテンポを動的に調整する機能を持たせることができる。また、特定の感情が検出された場合には、それに応じた特別なサポートやアドバイスを提供する専門家への連絡を促すプロトコルも組み込むことができる。
応答手段には、着信者の過去の通話履歴や関連データを分析し、より個人化された応答を提供するパーソナライゼーション機能を追加することができる。これにより、ユーザーにとってより関連性が高い情報を提供し、応答の有用性を高めることが可能となる。
送信手段に関しては、文書化された通話内容に基づいてフォローアップのアクションを提案する機能を追加することができる。例えば、通話内容に含まれるタスクや予定に対して、カレンダーアプリへの自動登録機能を統合することで、ユーザーの時間管理をサポートする。
さらに、感情認識手段は、ユーザーのストレスレベルや緊張感を検知し、適宜、ストレス軽減のためのアドバイスやリラクゼーションコンテンツへのリンクを提供する機能を備えることができる。これにより、ユーザーの精神的な健康を支援し、総合的なウェルビーイングを促進することができる。
本形態例のシステムには、さらなる機能向上を図るための複数の追加機能が考慮される。例えば、未登録または非通知の着信に対して、通話者の声紋を分析し、以前の通話データと照合することにより、通話者の身元を特定する声紋認識手段を追加することができる。これにより、通話者が過去にシステムとやり取りしたことがある場合、その情報を基に応答手段がより適切な対応を行うことが可能となる。また、声紋認識手段は、セキュリティ対策としても機能し、ユーザーに対する信頼性の高い通話体験を提供する。
送信手段についても、通話内容を文書化する際に、通話の内容を構造化し、情報の重要度に応じてテキストの階層化を行う機能を考慮する。これにより、ユーザーは文書を読む際に重要な情報をより迅速に把握できるようになる。さらに、文書化された内容をユーザーの好みや過去の行動パターンに合わせてカスタマイズすることで、より個人的な体験を提供することが可能となる。
感情認識手段では、通話中にユーザーのストレスレベルを検知し、ストレスが高いと推定される場合には、通話内容に関連したリラクゼーション方法や心理的サポートへの案内を提供する。これにより、ユーザーが通話を通じてリラックスし、ストレスを軽減できるようなサービスを提供する。
さらに、応答手段には、通話内容に基づいてユーザーへのフォローアップアクションを自動的に提案する機能を追加する。例えば、通話中に提案された製品やサービスに関する追加情報へのリンクを提供したり、次の行動ステップを提案することで、ユーザーの意思決定をサポートする。
応答手段の改善としては、通話者の意図に応じて自動的に応答スタイルを変更する機能を検討する。たとえば、通話者が緊急の状況を示している場合には、迅速かつ的確な指示を提供するようにAIを調整する。このような対応により、通話者のニーズに即応できるようなシステムを実現する。
これらの追加機能は、ユーザーエクスペリエンスの向上を目指すとともに、通話内容の正確な把握と迅速な対応を可能にするためのものである。また、それぞれの機能は、データ処理装置やスマートデバイスの制御部の能力を最大限に活用し、システムの有用性をさらに高めることが期待される。
（形態例２）
本発明を実施するための形態は、通話終了後に用件をチャットGPTで文書化し、メッセンジャーアプリやメールで送信する際に、ユーザの感情を認識する感情エンジンを使用して文書の表現を調整する調整手段とを具備するシステムである。具体的には、文書化された用件を感情エンジンに入力し、ユーザの感情を分析する。分析結果に基づいて、文書の表現を適切に調整する。例えば、ユーザが喜びや興奮を感じている場合には、より明るく活気のある表現を使用することで、ユーザの感情を共有する。
調整手段は、通話終了後に用件をチャットGPTで文書化し、メッセンジャーアプリやメールで送信する際に、ユーザの感情を認識し、文書の表現を調整する機能として、データ処理装置１２の特定処理部２９０によって実現される。この調整手段は、感情エンジンによる分析結果に基づいて文書の表現を適切に調整する機能を提供する。また、調整手段は、ヘッドセット型端末３１４の制御部４６Ａによって実現されてもよい。各手段と装置や制御部との対応関係は、上述した例に限定されず、種々の変更が可能である。
会話内容抽出手段は、音声認識技術を駆使して通話内容をテキストデータに変換する。また、会話内容抽出手段は、音声信号を受信した後、ノイズリダクション手段を用いて背景雑音を除去し、音声からテキストへの変換精度を向上させる。さらに、会話内容抽出手段は、最先端の音声認識技術を用いて、通話内容を精確にテキストデータへ変換し、その過程でノイズ除去手段が背景雑音を除去し、変換精度を向上させる。
テキスト処理手段は、変換されたテキストデータの構文上の誤りを修正し、言語の流暢さを保つために文法検査手段を介して文法チェックを行う。また、テキスト処理手段は、テキスト化されたデータの文法検査手段が語法の正確性を保証し、文書の自然な流れを維持するための調整を行う。
感情認識手段には、テキストマイニングと感情分析技術に基づく感情抽出手段が含まれ、生成されたテキストデータの言葉遣いや文脈からユーザの感情を推定する。また、感情抽出手段は、様々な感情を表す単語やフレーズ、文法的パターンを識別し、それらをポジティブ、ネガティブ、ニュートラルなどの感情カテゴリに分類する。また、感情強調手段は、抽出された感情に応じてテキストのトーンや言い回しを調整し、ユーザの感情状態をより適切に伝えるための修正を行う。さらに、感情抽出手段は、文書に表れる言語パターンからユーザの感情を読み取り、それを基に文書のトーンを調整する。
調整手段は、感情認識手段によって分析された感情データを基に、テキストの表現を変化させる。また、表現強化手段は、喜びや興奮などのポジティブな感情が検出された場合、使われる語彙をより明るく活気のあるものに置き換え、メッセージに好意的な印象を与える。さらに、表現緩和手段は、悲しみや怒りなどのネガティブな感情が検出された場合、メッセージのトーンを穏やかにし、共感と理解を示すような言い回しを選択する。また、ユーザの感情がポジティブな場合は、表現強化手段が文書に活力を与え、ネガティブな感情を示す場合は、表現緩和手段によって、より穏やかな表現を使用する。
メッセージの送信手段には、メッセンジャーアプリやメールクライアントとの連携機能が含まれ、調整されたテキストを適切な形式で送信する。また、送信プロトコル選定手段は、受信者のプラットフォームや設定に合わせて最適な送信プロトコルを選択し、メッセージの配信を保証する。また、ユーザインタフェース提示手段は、送信前に文書の最終レビューを行うためのプレビュー画面を提供し、ユーザが必要に応じて最終的な修正を加えることができるようにする。さらに、メッセージ送信手段がメッセンジャーアプリやメールクライアントに適したフォーマットで調整されたテキストを送信し、プレビュー画面がユーザが文書を最終確認するためのインターフェースを提供する。
以上のプロセスは、ユーザの使用するデバイスや設定に応じて、スマートデバイスの制御部やデータ処理装置に内蔵された特定処理部で実現される。また、これらの手段は、モジュール化されたコンポーネントとして設計され、システムの構成要素としての交換や拡張が容易に行われる。さらに、各手段の対応関係はフレキシブルに設定されており、システムのアップグレードやカスタマイズに対応するための多様な変更が可能である。また、この一連のプロセスは、デバイスや環境に応じて柔軟に対応できるようにモジュール化されており、システムのアップグレードやカスタマイズが容易に行えるように設計されている。
この形態例を更に拡張して、ユーザーの感情をより深く理解し、コミュニケーションの質を高める機能を追加することができる。例えば、感情エンジンにビデオチャット中の表情認識機能を組み込むことで、視覚的な情報からも感情を分析し、より正確な感情判断を行う。感情認識の精度を向上させるために、ユーザーの声のトーンやピッチの分析も行い、テキストに反映させることが可能である。さらに、ユーザーの過去のコミュニケーション履歴や反応パターンを分析することで、個人の感情表現スタイルを学習し、それに応じたよりパーソナライズされたテキスト調整を実現する。
テキスト処理手段は、表現の多様性と創造性を高めるために、文学作品や詩などからインスピレーションを得た言い回しを提案する機能を持つ。これにより、ユーザーの感情がより豊かに表現される。また、社会的コンテキストや文化的背景を考慮し、コミュニケーションが行われる環境や状況に合わせた適切な表現を選択することもできる。
メッセージ送信手段には、送信されるテキストが受け手の感情に与える影響を予測する機能を追加し、ユーザーがより責任を持ってコミュニケーションを取れるようにする。さらに、受信者の反応をAIが予測し、その情報をもとにユーザーが次に取るべきコミュニケーション戦略を提案することも可能である。
全体として、これらの機能は、ユーザーが感情を的確かつ敏感にコミュニケーションに反映させることをサポートし、より深い人間関係の構築に貢献する。また、これらの進化した手段は、個人だけでなく、企業のカスタマーサポートやCRMシステムにおいても、顧客との関係を深めるために有効活用できる。さらに、これらのシステムは、ユーザー教育やカウンセリングといった人間の感情が重要な役割を果たす分野での応用が期待される。
本システムは、感情エンジンを活用してユーザーの感情に応じた文書の表現調整を提供する。この機能を拡張するために、ユーザーの生体情報を取得するセンサーを統合し、心拍数や皮膚の導電率などの生理的反応から感情をより正確に読み取ることができる。センサーからのデータはリアルタイムで分析され、文書のトーンを即座に調整することが可能となる。
さらに、ユーザーの日常的なコミュニケーションを継続的に分析し、その人固有の表現スタイルや好みを把握する個性化学習機能を搭載する。この機能により、システムはユーザーの個性を反映したより自然で個別化された文書の提案が可能になる。
また、マルチリンガル対応を強化し、さまざまな言語での感情的ニュアンスを捉えることができるようにする。この機能により、国際的なコミュニケーションや多言語を話すユーザー間での理解を深めることができる。
ユーザーのプライバシー保護のために、感情データの匿名化や暗号化を行い、セキュリティを強化する機能も追加する。これにより、ユーザーは安心してシステムを利用できるようになる。
教育やカウンセリングの分野での応用を目指し、感情認識の結果を活用してコミュニケーションスキルのトレーニングをサポートする機能を開発する。トレーニングプログラムには、感情表現の練習や、適切なコミュニケーション手法の学習が含まれる。
最後に、システムのユーザビリティを向上させるために、ユーザーインターフェースをリッチかつ直感的なものにする。さまざまなジェスチャーや音声コマンドをサポートし、ユーザーが文書の調整プロセスに容易に介入できるようにする。これにより、ユーザーは自分の意志で表現を微調整し、より個人的なコミュニケーションを実現できる。
（形態例３）
本発明を実施するための形態は、通話中にユーザの感情を認識する感情エンジンを使用して、応答や対話の内容をユーザの感情に合わせて調整する調整手段とを具備するシステムである。具体的には、通話中にユーザの声のトーンや言葉の選択などを感情エンジンに入力し、ユーザの感情を推定する。推定された感情に基づいて、生成系AIの応答や対話の内容を調整する。例えば、ユーザが悲しい感情を示している場合には、共感の言葉を用いて励ましや支援を提供する。
調整手段は、通話中にユーザの感情を認識し、応答や対話の内容をユーザの感情に合わせて調整する機能として、ヘッドセット型端末３１４の制御部４６Ａによって実現される。この調整手段は、通話中にユーザの声のトーンや言葉の選択を感情エンジンに入力し、ユーザの感情を推定し、生成系AIの応答や対話の内容を調整する機能を提供する。また、調整手段は、データ処理装置１２の特定処理部２９０によって実現されてもよい。各手段と装置や制御部との対応関係は、上述した例に限定されず、種々の変更が可能である。
音声データ収集手段は、高感度でノイズキャンセリング機能を備えたマイクロフォンが含まれる。このマイクロフォンは、周囲の雑音を除去し、ユーザの声を明瞭に録音するためのデジタル信号処理アルゴリズムを搭載している。また、音声データ収集手段は、ユーザの発話から微細な音響特性を把握する高感度マイクロフォンを採用し、これにデジタル信号処理技術を組み合わせて周囲の雑音を効果的に除去する。このマイクロフォンは、ユーザの声の特性を正確に捉え、感情の変化を検出するための基礎データを提供する。
音声特徴抽出手段は、音声信号から人間の感情を反映する可能性のある特徴量を抽出する。この抽出手段は、音響特徴分析を行うためのスペクトログラム解析機能やピッチ追跡機能、音響モデルを用いた感情識別機能が含まれる。また、音声特徴抽出手段は、スペクトル分析機能やピッチ解析機能といった音響解析ツールを用いて、音声信号から感情を示唆する特徴量を抽出し、これらのデータを感情推定モデルへと供給する。
感情分析手段には、機械学習に基づいた感情推定モデルが含まれ、音声特徴抽出手段によって抽出された特徴量を入力として、ユーザの感情状態を推定する。感情推定モデルは、トレーニングデータに基づいて訓練されたニューラルネットワーク、サポートベクターマシン、決定木などの分類器から構成され、ユーザの感情をポジティブ、ネガティブ、中立などのカテゴリに分類する。感情推定モデルは、継続的な学習を通じてその精度を向上させ、ユーザの感情に対する認識の微妙な変化にも対応できるように進化する。また、感情分析手段は、機械学習技術を活用した感情推定モデルを有し、抽出された音声特徴を元にユーザの感情を分析する。このモデルは、様々な機械学習アルゴリズムを組み合わせて、ユーザの発話から感情カテゴリーを識別し、これに基づいてユーザの感情状態を推定する。推定された感情状態は、ユーザの発話や振る舞いに対するシステムの反応を調整するための重要な情報となる。
対話調整手段には、生成系AIモデルを用いた応答生成機能が含まれ、感情分析手段によって推定された感情に応じて、対話の内容を動的に調整する。応答生成機能は、自然言語生成技術を駆使し、ユーザの感情に適した言葉選びやトーンを用いて応答文を生成する。例えば、ユーザが悲しい感情を示している場合、共感表現や慰めの言葉を含む応答が生成される。この応答生成機能は、会話の文脈を考慮し、ユーザの感情とコミュニケーションの目的に合致した内容を提供するためのコンテキストアウェア処理機能を備える。また、生成された応答は、ユーザにとって自然であり、感情的なニーズを満たすように構築される。応答生成機能は、大規模な会話データセットに基づいて訓練された機械学習モデルによって実現され、ユーザの言葉遣いや話し方に適応することができる。また、対話調整手段には、生成系AIモデルを用いた応答生成機能が含まれ、推定された感情に適切に応じた対話内容を生成する。この機能は、ユーザの感情に対応する言葉選びや対話トーンを選定し、ユーザの現在の感情や会話の文脈に合わせた反応を提供する。応答生成機能は、コンテキストアウェアな処理を行い、ユーザが必要とする情報やサポートを適切な形で提供するために設計されている。この機能は、大量の会話データから学習されたモデルを基に、ユーザの言葉遣いや情緒に適応した応答を生成することが可能である。
センサーを含まないデータ収集手段としては、ユーザがシステムに直接入力したテキスト情報や、通話記録から得られるメタデータが考えられる。これらは、ユーザの行動パターンや好みを分析する際の補足情報として利用され、感情エンジンの精度向上や応答生成機能の最適化に寄与する。ユーザが入力するテキスト情報は、感情分析手段の一部として、感情推定モデルの訓練データとしても活用される。
本システムは、通話中にユーザの感情をリアルタイムで分析し、対話内容を自動調整する能力を有しているため、さらに細かな感情の変化を捉えるために、音声データに加えて、表情認識技術を統合することができる。ウェブカメラやスマートデバイスのカメラを活用して、ユーザの顔の表情を分析し、感情分析の精度を向上させる。この追加された表情認識機能は、ユーザの感情をより正確に認識し、さらに微細な感情変化に応じた対話調整が可能となる。
また、ユーザの生理的シグナルを捉えるためのウェアラブルデバイスを組み込むことも考えられる。心拍数や皮膚電気活動など、生理的反応を測定することで、声のトーンや表情だけでは捉えきれない感情の深層を解析する。これらのデータを統合することで、感情分析の精度はさらに向上し、より適切な対話応答を生成することができる。
対話調整手段には、ユーザの文化的背景や個人的な価値観を考慮したカスタマイズ機能を追加することも有効である。ユーザプロファイルを構築し、その情報に基づいて、対話のトーンや内容をさらにパーソナライズする。これにより、ユーザ一人ひとりに合わせたきめ細やかなサポートを提供することが可能となる。
さらに、感情推定モデルの進化を促すために、クラウドソーシングによる感情データの収集や、多様なユーザからのフィードバックを取り入れて、モデルを継続的にアップデートする仕組みを構築する。これにより、多様な感情表現や言語に対応できる柔軟なシステムとなる。
また、教育やメンタルヘルスケアの分野における応用も検討することができる。例えば、教育分野では、学生の感情に適応した教材の提示やカウンセリングセッションでの使用が考えられる。メンタルヘルスケアでは、ユーザの感情を認識し、ストレスや不安を軽減するための対話支援を行う。これにより、ユーザが抱える問題に対してより効果的なアプローチが可能となる。
システムのプライバシー保護に関しても、ユーザの感情データを安全に保管し、適切なアクセス制御と暗号化技術を用いて、情報漏洩のリスクを最小限に抑えるためのセキュリティ対策を強化する。これにより、ユーザは安心してシステムを利用することができる。
最終的には、このシステムが提供するパーソナライズされた対話体験が、ユーザの生活の質を向上させるようなサービスへと発展することが期待される。
本発明の形態は、通話のみならず、ビデオ会議やオンライン教育のプラットフォームにも適用可能である。例えば、講師が生徒の感情をリアルタイムで把握し、カリキュラムの進行を感情に合わせて調整することで、より効果的な学習経験を提供する。ビデオ会議においても、参加者の感情を反映した対話管理が行われ、生産的かつポジティブな会議環境を促進する。
また、本システムには、感情反応に基づいた健康状態の監視機能を追加することも可能である。例えば、ユーザの声のトーンや話し方が一定期間にわたってネガティブな感情を示している場合、メンタルヘルスの専門家に通知を送り、必要に応じた介入を促すことができる。
さらに、感情エンジンの高度化に向け、ユーザの日常生活における感情パターンを分析し、その情報を元に長期的な感情管理やストレス軽減のアドバイスを提供する機能を組み込む。ユーザの生活リズムや活動パターンを分析し、感情の波を予測することで、適切なタイミングでリラクゼーションやモチベーション向上のためのコンテンツを提案する。
このシステムは、カスタマーサポートの分野でも応用が期待される。例えば、コールセンターのオペレーターが顧客の感情をリアルタイムで把握し、不満や怒りなどのネガティブな感情を検出した際には、即座に対応策を講じ、顧客満足度の向上に寄与する。
また、ゲームやエンターテインメントの分野でも、ユーザの感情に応じてコンテンツを動的に変化させることで、没入感や楽しさを増幅させる効果が期待される。ゲーム内のキャラクターがプレイヤーの感情に反応し、ストーリー展開や対話内容が変化することで、よりパーソナライズされた体験を実現する。
さらに、音声アシスタントや仮想現実（VR）との統合を図り、ユーザの感情に対してより自然な対話を実現する。音声アシスタントはユーザの感情を把握し、個々のニーズに合わせた情報やサービスを提供する。VR環境では、ユーザの感情に応じてシナリオや環境が変化し、リアルタイムで感情に合わせた体験を提供する。
本システムは、ユーザインタフェース（UI）やユーザーエクスペリエンス（UX）の設計においても革新をもたらす可能性を秘めている。感情認識技術を利用して、ユーザの感情に最適化されたUIやUXを提供し、利用者の満足度を高める。例えば、ウェブサイトやアプリケーションがユーザの感情をリアルタイムで把握し、コンテンツの提示方法やインタラクションの形式を調整する。
最終的には、このシステムが提供する感情調整機能が、人間関係の質を向上させ、コミュニケーションの効果を高めるツールとして社会に広く浸透していくことが期待される。 (Example 1)
The embodiment of the present invention is a system that includes a response means in which a generative AI responds when a call is received from an unregistered or unnotified phone number, and a transmission means in which the matter is documented in chat GPT after the call is ended and sent via a messenger app or email. Furthermore, as an emotion recognition means that combines an emotion engine that recognizes the user's emotions, the tone of the user's voice and choice of words are analyzed during the call to estimate the emotion. Based on the estimated emotion, the generative AI's response and the expression of the documented matter are adjusted. For example, if the user shows an anxious emotion, a calmer expression is used to provide a sense of security.
The response means is realized by the specific processing unit 290 of the data processing device 12, and provides a function of the generative AI responding to an incoming call from an unregistered or unnotified phone number. The transmission means is realized by the specific processing unit 290 of the data processing device 12 as a function of documenting the matter in chat GPT after the call is ended and sending it by a messenger app or email. The emotion recognition means is realized by the control unit 46A of the headset type terminal 314 as a function of analyzing the tone of the user's voice and choice of words during the call and estimating emotions, but may also be realized by the specific processing unit 290 of the data processing device 12. The correspondence between each means and the device or control unit is not limited to the above example, and various changes are possible.
The response means includes a dialogue generation means that combines natural language processing and speech synthesis. The dialogue generation means automatically responds to unregistered or unnamed incoming calls, enabling a dialogue with the user. When an unregistered or unnamed incoming call is detected, the generative AI responds using a pre-trained conversation model. This conversation model is trained based on a variety of dialogue data so that it can handle various scenarios, and provides appropriate answers and guidance to the callee's questions and requests. In addition, the response means has an AI dialogue generation function, and can automatically provide an appropriate response to an incoming call from an unregistered or unnamed phone number. This function combines a dialogue management function for understanding the callee's purpose and request at the early stage of the call and providing a corresponding response, and a voice generation function for converting the generated response into a natural voice. The dialogue management function can analyze the callee's intention based on the detection of specific keywords and phrases and generate an appropriate response. The voice generation function can convert a text-based response into voice in real time, providing the callee with a natural conversation experience.
The transmission means includes a chatbot and a documentation means using natural language understanding technology. After the call is ended, the contents of the call are accurately documented using an advanced natural language understanding model such as chat GPT. The documentation means has the ability to extract and summarize the main points of the call, and can generate a document that conveys the purpose concisely and clearly. The generated document is sent to the user via a messenger app or email. This process includes a link to the user's email address or messenger account, and the document is automatically sent in the appropriate format. The transmission means can also convert the contents of the call into text and provide it in an accessible form to the user. After the call is ended, the call content documentation function is activated and can extract and summarize the main points of the conversation. This summarized text is sent to the user's designated email address or messenger app through an automatic delivery function. This automatic delivery function includes an email delivery function and an app link function to properly format the document and ensure that it is delivered to the designated destination.
The emotion recognition means includes a voiceprint analysis means for performing voice analysis, and a language analysis means for estimating emotions from the choice of words. The voiceprint analysis means can detect characteristics of the user's voice, such as tone, pitch, and speed, using the microphone of the smart device, and can estimate the user's emotional state from these voice characteristics. The language analysis means can analyze the content of the user's speech and extract emotional context from the words and phrases used. These analysis results are used by the generative AI when making a response or adjusting the expression of documented matters. For example, if the user shows anxiety, the expression of the response or document is adjusted to be more gentle and reassuring. In addition, the emotion recognition means has a function of estimating emotions from the characteristics of the user's voice and language use during a call. The voiceprint analysis function can extract features such as the pitch, strength, and speed of the voice from the voice data collected by the microphone, and can estimate emotions by analyzing these characteristics. The language emotion analysis function can process the language data during a call, analyze the emotional meaning of the words and phrases used, and grasp the user's emotional state. These analytics can be used to adjust the tone of the AI's responses and the wording of documented calls to provide an appropriate emotional response to the user.
Data collection methods that do not involve sensors include text data entered by users themselves and feedback on system usage. These are provided to the system through the user input reception function and feedback collection function, and are used as a valuable source of information for service improvement.
These means are intended to respond quickly and effectively to user requests and improve the quality of communication. In addition, the implementation of each means is flexibly performed by the data processing device or the control unit of the smart device, and can be modified in various ways to improve the efficiency and usability of the system.
As an additional function, the system of the present invention can be equipped with an outline prediction means for predicting the caller's intention before answering an unregistered or unnamed call. This allows the response means to generate a more accurate dialogue, realizing a meaningful exchange for the user. In addition, when the generative AI responds, it can have a language selection function based on the caller's country or region, enabling automatic response in multiple languages.
As for the delivery method, in addition to documenting the contents of the call, it can also highlight important keywords and phrases to help users quickly understand the document, and automatically generate action items based on the documented content and add them to the user's task list.
The emotion recognition means can detect changes in the user's emotions during a call in real time and dynamically adjust the tone and tempo of the conversation of the response means. Protocols can also be built in that, when a particular emotion is detected, prompt the user to contact an expert who can provide special support or advice according to the emotion.
The response tool can be enhanced with a personalization feature that analyzes the caller's past call history and related data to provide a more personalized response, making it possible to provide more relevant information to the user and increase the usefulness of the response.
Regarding sending methods, a function can be added that suggests follow-up actions based on the documented content of the call. For example, tasks and events included in the call content can be automatically registered in a calendar app to help users manage their time.
Furthermore, the emotion recognition means may have the ability to detect the user's stress level or tension and provide appropriate advice on how to reduce stress or links to relaxation content, thereby supporting the user's mental health and promoting overall well-being.
The system of this embodiment is considered to have a number of additional functions for further improving its functionality. For example, a voiceprint recognition means can be added to identify a caller by analyzing the caller's voiceprint and comparing it with previous call data for an unregistered or unnamed call. This allows the response means to respond more appropriately based on information if the caller has previously interacted with the system. The voiceprint recognition means also functions as a security measure, providing a reliable call experience for the user.
Regarding the means of transmission, when documenting the contents of the call, the function of structuring the contents of the call and classifying the text according to the importance of the information will be considered. This will allow users to grasp important information more quickly when reading the document. In addition, the documented content can be customized according to the user's preferences and past behavior patterns to provide a more personal experience.
The emotion recognition means detects the user's stress level during a call, and if it is estimated that the user is under high stress, it provides guidance on relaxation methods and psychological support related to the content of the call, thereby providing a service that allows users to relax and reduce stress through the call.
Additionally, the response tool will automatically suggest follow-up actions to users based on the content of the call, for example providing links to additional information about products or services suggested during the call or suggesting next steps of action to help users make decisions.
To improve response methods, we are considering a function that automatically changes the response style according to the caller's intentions. For example, if the caller indicates an emergency situation, we will adjust the AI to provide quick and accurate instructions. This will create a system that can immediately respond to the caller's needs.
These additional functions are intended to improve the user experience and enable accurate understanding of call content and rapid response. Each function is expected to maximize the capabilities of data processing devices and smart device control units, further enhancing the usability of the system.
(Example 2)
The embodiment of the present invention is a system that includes an adjustment means for documenting the matter in chat GPT after the call ends, and adjusting the expression of the document using an emotion engine that recognizes the user's emotions when sending it by messenger app or email. Specifically, the documented matter is input to the emotion engine, and the user's emotions are analyzed. Based on the analysis results, the expression of the document is appropriately adjusted. For example, if the user is feeling happy or excited, the user's emotions are shared by using brighter and more lively expressions.
The adjustment means is realized by the specific processing unit 290 of the data processing device 12 as a function of recognizing the user's emotions and adjusting the expression of the document when documenting the matter in chat GPT after the call ends and sending it by messenger app or email. This adjustment means provides a function of appropriately adjusting the expression of the document based on the analysis result by the emotion engine. The adjustment means may also be realized by the control unit 46A of the headset type terminal 314. The correspondence between each means and the device or control unit is not limited to the above example, and various changes are possible.
The conversation content extraction means converts the contents of the call into text data by making full use of voice recognition technology. After receiving the voice signal, the conversation content extraction means uses noise reduction means to remove background noise and improve the accuracy of the voice-to-text conversion. Furthermore, the conversation content extraction means uses cutting-edge voice recognition technology to accurately convert the contents of the call into text data, and in the process, the noise reduction means removes background noise and improves the conversion accuracy.
The text processing means corrects syntactic errors in the converted text data and performs grammar checking via the grammar checking means to maintain linguistic fluency. The text processing means also performs adjustments to ensure that the grammar checking means of the converted text data ensures accuracy of grammar and maintains the natural flow of the document.
The emotion recognition means includes an emotion extraction means based on text mining and emotion analysis techniques, which estimates the user's emotion from the wording and context of the generated text data. The emotion extraction means also identifies words, phrases, and grammatical patterns that express various emotions and classifies them into emotion categories such as positive, negative, and neutral. The emotion emphasis means also adjusts the tone and phrasing of the text according to the extracted emotions, making modifications to better convey the user's emotional state. The emotion extraction means also reads the user's emotion from the language patterns expressed in the document and adjusts the tone of the document based on the emotion.
The adjustment means changes the expression of the text based on the emotion data analyzed by the emotion recognition means. When a positive emotion such as joy or excitement is detected, the expression enhancement means replaces the vocabulary used with brighter and more lively words to give the message a favorable impression. When a negative emotion such as sadness or anger is detected, the expression mitigation means softens the tone of the message and selects phrases that show empathy and understanding. When the user's emotion is positive, the expression enhancement means energizes the document, and when the user's emotion shows negative emotion, the expression mitigation means uses a more gentle expression.
The message sending means includes a function for linking with a messenger application or an email client and sends the adjusted text in an appropriate format. Furthermore, the transmission protocol selection means selects an optimal transmission protocol according to the platform and settings of the recipient and ensures delivery of the message. Furthermore, the user interface presentation means provides a preview screen for a final review of the document before sending, allowing the user to make final corrections as necessary. Furthermore, the message sending means sends the adjusted text in a format suitable for the messenger application or the email client, and the preview screen provides an interface for the user to make a final check of the document.
The above processes are realized by a specific processing unit built into the control unit of the smart device or the data processing device, depending on the device and settings used by the user. Moreover, these means are designed as modularized components, and can be easily replaced or expanded as system components. Furthermore, the correspondence between each means is set flexibly, and various changes can be made to accommodate system upgrades and customization. Moreover, this series of processes is modularized so that it can flexibly respond to devices and environments, and is designed to facilitate system upgrades and customization.
This embodiment can be further expanded to add a function for deeper understanding of the user's emotions and improving the quality of communication. For example, by incorporating a facial expression recognition function during video chat into the emotion engine, emotions can be analyzed from visual information as well, resulting in more accurate emotion judgment. To improve the accuracy of emotion recognition, the tone and pitch of the user's voice can also be analyzed and reflected in the text. Furthermore, by analyzing the user's past communication history and response patterns, the system can learn the individual's emotional expression style and adjust the text accordingly in a more personalized manner.
The text processing means has a function to suggest phrases inspired by literary works and poetry to enhance the diversity and creativity of expressions, allowing users to express their emotions more richly. It can also select appropriate expressions according to the environment and situation in which the communication takes place, taking into account the social context and cultural background.
The messaging tool will add a function to predict the emotional impact of the text sent on the recipient, allowing users to communicate more responsibly. In addition, AI can predict the recipient's reaction and use that information to suggest the next communication strategy the user should take.
Overall, these features will help users to reflect emotions accurately and sensitively in their communications, contributing to building deeper human relationships. These advanced methods can be effectively used by individuals as well as corporate customer support and CRM systems to deepen customer relationships. Furthermore, these systems are expected to be applied in fields where human emotions play an important role, such as user education and counseling.
The system leverages an emotion engine to provide document expression adjustments based on the user's emotions. To extend this functionality, sensors are integrated to capture the user's biometric information, allowing for a more accurate reading of emotions from physiological responses such as heart rate and skin conductivity. Data from the sensors is analyzed in real time, allowing the tone of the document to be adjusted on the fly.
In addition, it is equipped with a personalized learning function that continuously analyzes the user's daily communication to understand the user's unique expression style and preferences, allowing the system to suggest more natural and personalized documents that reflect the user's personality.
It will also enhance multilingual support, allowing it to capture emotional nuances in different languages, enhancing international communication and understanding among multilingual users.
To protect user privacy, the company will also add features to anonymize and encrypt emotion data and strengthen security, allowing users to use the system with peace of mind.
Aiming for applications in the fields of education and counseling, we will develop a function that uses the results of emotion recognition to support communication skills training. The training program will include practicing emotional expression and learning appropriate communication techniques.
Finally, to improve the usability of the system, we make the user interface rich and intuitive, supporting a variety of gestures and voice commands, and allowing users to easily intervene in the document adjustment process, allowing users to fine-tune the expression at their own will and achieve more personal communication.
(Example 3)
The embodiment of the present invention is a system that includes an emotion engine that recognizes the user's emotions during a call and an adjustment means that adjusts the response and dialogue content to match the user's emotions. Specifically, the user's tone of voice and choice of words are input to the emotion engine during a call to estimate the user's emotions. The generative AI's responses and dialogue content are adjusted based on the estimated emotions. For example, if the user shows sad emotions, words of empathy are used to provide encouragement and support.
The adjustment means is realized by the control unit 46A of the headset terminal 314 as a function of recognizing the user's emotions during a call and adjusting the response and the content of the dialogue to match the user's emotions. This adjustment means provides a function of inputting the user's tone of voice and choice of words into the emotion engine during a call, estimating the user's emotions, and adjusting the generative AI's response and the content of the dialogue. The adjustment means may also be realized by the specific processing unit 290 of the data processing device 12. The correspondence between each means and the device or control unit is not limited to the above example, and various changes are possible.
The voice data collection means includes a microphone with high sensitivity and noise canceling function. The microphone is equipped with a digital signal processing algorithm to remove ambient noise and clearly record the user's voice. The voice data collection means also employs a highly sensitive microphone that captures subtle acoustic characteristics from the user's speech and combines this with digital signal processing technology to effectively remove ambient noise. The microphone accurately captures the characteristics of the user's voice and provides basic data for detecting changes in emotions.
The voice feature extraction means extracts features that may reflect human emotions from the voice signal. This extraction means includes a spectrogram analysis function and a pitch tracking function for performing acoustic feature analysis, and an emotion identification function using an acoustic model. The voice feature extraction means also uses acoustic analysis tools such as a spectrum analysis function and a pitch analysis function to extract features that suggest emotions from the voice signal, and supplies these data to the emotion estimation model.
The emotion analysis means includes an emotion estimation model based on machine learning, which estimates the user's emotional state using the features extracted by the voice feature extraction means as input. The emotion estimation model is composed of classifiers such as neural networks, support vector machines, and decision trees trained based on training data, and classifies the user's emotions into categories such as positive, negative, and neutral. The emotion estimation model improves its accuracy through continuous learning, and evolves to be able to respond to subtle changes in the user's perception of emotions. In addition, the emotion analysis means has an emotion estimation model that utilizes machine learning technology, and analyzes the user's emotions based on the extracted voice features. This model combines various machine learning algorithms to identify emotion categories from the user's utterance, and estimates the user's emotional state based on this. The estimated emotional state is important information for adjusting the system's response to the user's utterance and behavior.
The dialogue adjustment means includes a response generation function using a generative AI model, which dynamically adjusts the content of the dialogue according to the emotion estimated by the emotion analysis means. The response generation function utilizes natural language generation technology to generate a response sentence using words and tones appropriate for the user's emotion. For example, if the user is expressing sad emotion, a response including empathetic expressions and comforting words is generated. This response generation function has a context-aware processing function for taking into account the context of the conversation and providing content that matches the user's emotion and the purpose of the communication. In addition, the generated response is constructed to be natural to the user and to meet the emotional needs. The response generation function is realized by a machine learning model trained on a large-scale conversation dataset and can adapt to the user's language and speaking style. In addition, the dialogue adjustment means includes a response generation function using a generative AI model, which generates dialogue content appropriately corresponding to the estimated emotion. This function selects words and a dialogue tone that correspond to the user's emotion and provides a response that matches the user's current emotion and the context of the conversation. The response generation function is designed to perform context-aware processing and provide the information and support required by the user in an appropriate form. This feature is capable of generating responses that adapt to a user's language and emotions based on models trained from large amounts of conversational data.
Non-sensor-based data collection methods include text information entered directly by users into the system and metadata obtained from call records. These are used as supplementary information when analyzing user behavioral patterns and preferences, and contribute to improving the accuracy of the emotion engine and optimizing response generation functions. User-entered text information is also used as training data for emotion estimation models as part of the emotion analysis method.
The system has the ability to analyze the user's emotions in real time during a call and automatically adjust the dialogue content, so it can integrate facial expression recognition technology in addition to voice data to capture even subtler changes in emotions. It uses a webcam or a camera on a smart device to analyze the user's facial expressions and improve the accuracy of emotion analysis. This added facial expression recognition function will more accurately recognize the user's emotions and make it possible to adjust the dialogue according to even subtler emotional changes.
It is also conceivable to incorporate wearable devices to capture the user's physiological signals. Measuring physiological responses such as heart rate and electrodermal activity can provide deeper insight into emotions that cannot be captured by tone of voice or facial expressions alone. Integrating this data can further improve the accuracy of emotion analysis and generate more appropriate dialogue responses.
It would also be effective to add customization features to the dialogue adjustment method that take into account the user's cultural background and personal values. A user profile can be constructed and the tone and content of the dialogue can be further personalized based on that information. This makes it possible to provide detailed support tailored to each individual user.
Furthermore, in order to promote the evolution of the emotion estimation model, we will build a mechanism to collect emotion data through crowdsourcing and incorporate feedback from a variety of users to continuously update the model, resulting in a flexible system that can handle a variety of emotional expressions and languages.
Applications in the fields of education and mental health care can also be considered. For example, in the field of education, it could be used to present educational materials adapted to students' emotions or in counseling sessions. In mental health care, it could recognize the user's emotions and provide dialogue support to reduce stress and anxiety. This would enable a more effective approach to the problems the user is facing.
Regarding privacy protection for the system, we will strengthen security measures to safely store users' emotional data and minimize the risk of information leakage by using appropriate access control and encryption technology. This will allow users to use the system with peace of mind.
Ultimately, it is hoped that the personalized interaction experience provided by this system will be developed into services that improve the quality of users' lives.
The embodiment of the present invention is applicable not only to telephone calls but also to video conferencing and online education platforms. For example, a lecturer can grasp the emotions of students in real time and adjust the progress of the curriculum according to the emotions, thereby providing a more effective learning experience. Even in video conferencing, dialogue management that reflects the emotions of participants is performed, promoting a productive and positive meeting environment.
The system could also be enhanced to monitor health conditions based on emotional responses: for example, if a user's tone of voice or manner of speaking indicates negative emotions over a period of time, a mental health professional could be notified to intervene if necessary.
Furthermore, to further improve the emotion engine, a function will be incorporated that will analyze the emotional patterns of the user's daily life and provide advice on long-term emotion management and stress reduction based on that information. By analyzing the user's daily rhythm and activity patterns and predicting emotional ups and downs, the system will suggest content for relaxation and motivation at appropriate times.
This system is also expected to be applied in the field of customer support. For example, if call center operators could grasp the customer's emotions in real time and detect negative emotions such as dissatisfaction or anger, they could take immediate action to address the issue, contributing to improving customer satisfaction.
In the fields of games and entertainment, dynamic changes in content based on the user's emotions are expected to have the effect of increasing immersion and enjoyment. In-game characters will react to the player's emotions, changing the story development and dialogue, creating a more personalized experience.
In addition, we will integrate voice assistants and virtual reality (VR) to realize more natural dialogue based on the user's emotions. Voice assistants will understand the user's emotions and provide information and services tailored to individual needs. In the VR environment, the scenario and environment will change according to the user's emotions, providing an experience that matches the emotions in real time.
This system also has the potential to revolutionize the design of user interfaces (UI) and user experiences (UX). Using emotion recognition technology, it can provide UI and UX optimized for the user's emotions, increasing user satisfaction. For example, websites and applications can grasp the user's emotions in real time and adjust the way content is presented and the form of interaction.
Ultimately, it is hoped that the emotion regulation function provided by this system will become widely used throughout society as a tool to improve the quality of human relationships and increase the effectiveness of communication.

（形態例１）
ステップ１：未登録または非通知の電話番号からの着信があった場合、生成系AIが応答する。
ステップ２：通話終了後、用件をチャットGPTで文書化する。
ステップ３：文書化された用件を感情エンジンに入力し、ユーザの感情を分析する。
ステップ４：分析結果に基づいて、文書の表現を調整する。
ステップ５：調整された文書をメッセンジャーアプリやメールで送信する。
（形態例２）
ステップ１：通話終了後、用件をチャットGPTで文書化する。
ステップ２：文書化された用件を感情エンジンに入力し、ユーザの感情を分析する。
ステップ３：分析結果に基づいて、文書の表現を調整する。
ステップ４：調整された文書を選択し、送信先を指定する（例：家族）。
ステップ５：指定された送信先に文書をメッセンジャーアプリやメールで送信する。
（形態例３）
ステップ１：通話中にユーザの声のトーンや言葉の選択などを感情エンジンに入力し、ユーザの感情を分析する。
ステップ２：分析結果に基づいて、生成系AIの応答や対話の内容を調整する。
ステップ３：通話終了後、用件をチャットGPTで文書化する。文書化された用件は、感情エンジンに入力するために使用される。 (Example 1)
Step 1: When a call comes in from an unregistered or withheld phone number, the generative AI answers.
Step 2: After the call, document the matters you wish to discuss in Chat GPT.
Step 3: The documented requirements are input into the emotion engine to analyze the user's emotions.
Step 4: Based on the analysis results, the representation of the document is adjusted.
Step 5: Send the adjusted document via messenger app or email.
(Example 2)
Step 1: After the call, document the matters you wish to discuss in Chat GPT.
Step 2: Input the documented requirements into the emotion engine to analyze the user's emotions.
Step 3: Adjust the representation of the document based on the analysis results.
Step 4: Select the adjusted document and specify the recipient (e.g., family).
Step 5: Send the document to the specified recipient via messenger app or email.
(Example 3)
Step 1: During a call, the user's tone of voice, choice of words, etc. are input into the emotion engine to analyze the user's emotions.
Step 2: Based on the analysis results, adjust the generative AI's responses and dialogue content.
Step 3: After the call, document the matter in the chat GPT. The documented matter is used to input the emotion engine.

特定処理部２９０は、特定処理の結果をヘッドセット型端末３１４に送信する。ヘッドセット型端末３１４では、制御部４６Ａが、スピーカ２４０及びディスプレイ３４３に対して特定処理の結果を出力させる。マイクロフォン２３８は、特定処理の結果に対するユーザ入力を示す音声を取得する。制御部４６Ａは、マイクロフォン２３８によって取得されたユーザ入力を示す音声データをデータ処理装置１２に送信する。データ処理装置１２では、特定処理部２９０が音声データを取得する。 The specific processing unit 290 transmits the result of the specific processing to the headset type terminal 314. In the headset type terminal 314, the control unit 46A causes the speaker 240 and the display 343 to output the result of the specific processing. The microphone 238 acquires audio indicating a user input for the result of the specific processing. The control unit 46A transmits audio data indicating the user input acquired by the microphone 238 to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the audio data.

上記実施形態では、データ処理装置１２によって特定処理が行われる形態例を挙げたが、本開示の技術はこれに限定されず、ヘッドセット型端末３１４によって特定処理が行われるようにしてもよい。
［第４実施形態］ In the above embodiment, an example was given in which the specific processing is performed by the data processing device 12, but the technology of the present disclosure is not limited to this, and the specific processing may be performed by the headset type terminal 314.
[Fourth embodiment]

図７には、第４実施形態に係るデータ処理システム４１０の構成の一例が示されている。 Figure 7 shows an example of the configuration of a data processing system 410 according to the fourth embodiment.

図７に示すように、データ処理システム４１０は、データ処理装置１２及びロボット４１４を備えている。データ処理装置１２の一例としては、サーバが挙げられる。 As shown in FIG. 7, the data processing system 410 includes a data processing device 12 and a robot 414. An example of the data processing device 12 is a server.

ロボット４１４は、コンピュータ３６、マイクロフォン２３８、スピーカ２４０、カメラ４２、通信Ｉ／Ｆ４４、及び制御対象４４３を備えている。コンピュータ３６は、プロセッサ４６、ＲＡＭ４８、及びストレージ５０を備えている。プロセッサ４６、ＲＡＭ４８、及びストレージ５０は、バス５２に接続されている。また、マイクロフォン２３８、スピーカ２４０、カメラ４２、及び制御対象４４３も、バス５２に接続されている。 The robot 414 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication I/F 44, and a control target 443. The computer 36 includes a processor 46, a RAM 48, and a storage 50. The processor 46, the RAM 48, and the storage 50 are connected to a bus 52. The microphone 238, the speaker 240, the camera 42, and the control target 443 are also connected to the bus 52.

制御対象４４３は、表示装置、目部のＬＥＤ、並びに、腕、手及び足等を駆動するモータ等を含む。ロボット４１４の姿勢や仕草は、腕、手及び足等のモータを制御することにより制御される。ロボット４１４の感情の一部は、これらのモータを制御することにより表現できる。また、ロボット４１４の目部のＬＥＤの発光状態を制御することによっても、ロボット４１４の表情を表現できる。 The controlled object 443 includes a display device, LEDs in the eyes, and motors that drive the arms, hands, and legs. The posture and gestures of the robot 414 are controlled by controlling the motors of the arms, hands, and legs. Some of the emotions of the robot 414 can be expressed by controlling these motors. In addition, the facial expressions of the robot 414 can also be expressed by controlling the light emission state of the LEDs in the eyes of the robot 414.

図８には、データ処理装置１２及びロボット４１４の要部機能の一例が示されている。図８に示すように、データ処理装置１２では、プロセッサ２８によって特定処理が行われる。ストレージ３２には、特定処理プログラム５６が格納されている。 Figure 8 shows an example of the main functions of the data processing device 12 and the robot 414. As shown in Figure 8, in the data processing device 12, a specific process is performed by the processor 28. A specific process program 56 is stored in the storage 32.

ロボット４１４では、プロセッサ４６によって受付出力処理が行われる。ストレージ５０には、受付出力プログラム６０が格納されている。プロセッサ４６は、ストレージ５０から受付出力プログラム６０を読み出し、読み出した受付出力プログラム６０をＲＡＭ４８上で実行する。受付出力処理は、プロセッサ４６がＲＡＭ４８上で実行する受付出力プログラム６０に従って、制御部４６Ａとして動作することによって実現される。 In the robot 414, the reception output process is performed by the processor 46. A reception output program 60 is stored in the storage 50. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output process is realized by the processor 46 operating as the control unit 46A in accordance with the reception output program 60 executed on the RAM 48.

（形態例１）
本発明を実施するための形態は、未登録または非通知の電話番号からの着信時に、生成系AIが応答する応答手段と、通話終了後に用件をチャットGPTで文書化し、メッセンジャーアプリやメールで送信する送信手段とを具備するシステムである。さらに、家族や知人と名乗る場合には、専用の質問や合言葉を使用して確認する確認手段と、配送業者は業者用の合言葉を使用して本人につながる接続手段と、通話中には電話番号をチェックし、悪用歴のある番号の場合は警察に連携する連携手段とを具備する。
応答手段は、データ処理装置１２の特定処理部２９０によって実現され、未登録または非通知の電話番号からの着信に対して生成系AIが応答する。送信手段は、通話終了後に用件をチャットGPTで文書化し、メッセンジャーアプリやメールで送信する手段として、データ処理装置１２の特定処理部２９０によって実現される。確認手段は、例えば、ロボット４１４の制御部４６Ａによって実現され、専用の質問や合言葉を使用して家族や知人を確認する。接続手段は、配送業者が業者用の合言葉を使用して本人につながる手段として、ロボット４１４の制御部４６Ａによって実現される。連携手段は、電話番号をチェックし、悪用歴のある番号の場合に警察に連携する手段として、データ処理装置１２の特定処理部２９０によって実現される。各手段と装置や制御部との対応関係は、上述した例に限定されず、種々の変更が可能である。
応答手段は、音声認識技術を活用した発話理解機能が含まれ、未登録または非通知の番号からの着信に対して、事前に訓練されたAIが生成した自然言語による応答を行う。このAIは、大量の通話データを学習しており、さまざまなシナリオに対応できる回答を生成する能力を有する。着信があると、AIはリアルタイムで着信内容を解析し、適切な応答を選択して会話を進める。応答内容は、通話の性質に合わせてカスタマイズされるため、営業電話からの質問には丁寧に対応し、不審な着信に対しては慎重に応答する。また、応答手段は、未登録または非通知の電話番号からの着信に対応するため、音声合成技術と発話内容理解技術を駆使し、事前に訓練を受けたAIが適切な対応を行うこともできる。このAIは複数の応答シナリオを学習し、通話の意図を把握して、状況に応じた自然な対話を提供することができる。着信内容に基づいてAIが瞬時に応答を選択し、通話の趣旨に合わせてカスタマイズされた対応を行うため、営業電話の問い合わせには適切に対処し、怪しい着信には慎重に応じることができる。
送信手段には、通話内容をテキスト化する音声認識技術と、そのテキストを基に通話の要約を生成する自然言語生成技術が含まれる。通話が終了すると、AIは通話内容をテキストデータに変換し、その内容を要約して文書化する。生成された文書は、メッセンジャーアプリやメールなどのコミュニケーション手段を介して、ユーザーに送信される。このプロセスにより、ユーザーは通話の詳細を後からでも確認でき、重要な情報を見逃すことがない。また、送信手段では、終了した通話の内容を音声認識技術によりテキストに変換し、その後、自然言語処理技術を用いて要約を行い、生成された文書をメッセンジャーアプリやメールでユーザーに送付することもできる。ユーザーは送付された文書を介して通話の詳細をいつでも確認可能で、重要な情報が記録され、見落とされることがなくなる。
確認手段は、家族や知人からの着信と名乗る場合に、特定の質問や合言葉を使用して本人確認を行う機能を有する。この手段は、ユーザーが設定した質問や合言葉をAIが使用し、着信者が正しい応答を行うことで本人であることを確認する。本人確認が成功すると、AIは通話を継続し、必要に応じてユーザーに転送する。このプロセスにより、不正アクセスや詐欺を防ぎながら、本人であれば円滑な通信を確保する。また、確認手段は、家族や知人を名乗る着信者に対して、ユーザーが設定した質問や合言葉をAIが提示し、正確な回答が得られた場合にのみ、通話を継続またはユーザーに転送することもできる。この機能により、不正なアクセスや詐欺を未然に防ぎつつ、本人であることが確認された場合にはスムーズなコミュニケーションを実現する。
接続手段は、配送業者などの特定の職種の者が業者用の合言葉を使用することで本人と通話ができる機能を有する。この手段では、AIが業者用の合言葉を受け取り、正しい合言葉であることを確認した後、通話を本人に繋ぐ。業者用の合言葉は、通話のセキュリティを確保するために重要であり、誤った合言葉が入力された場合、通話は繋がらない。また、接続手段は、配送業者などの職種固有の合言葉をAIが受け取り、その正当性を確認した上で、通話をユーザーにつなぐ機能を担うこともできる。この合言葉によって、通話のセキュリティが確保され、誤った合言葉の場合には通話が切断される仕組みとなっている。
連携手段は、通話中に電話番号をリアルタイムでチェックし、その番号に悪用歴がある場合は自動的に警察や関連機関に連携する機能を有する。この手段は、データベースに蓄積された不審な番号のリストを参照し、着信番号がそのリストに含まれているかを確認する。不審な番号からの通話であると判断された場合、システムは即座に警察への報告プロトコルを開始し、状況に応じて適切な対応を取る。また、連携手段は、通話中の番号をリアルタイムで監視し、過去に悪用された履歴がある番号であれば自動的に警察などの関連機関と連携する機能を果たすこともできる。不審な番号の検出時には、警察への通報プロトコルが起動し、迅速な対応が取られる。
これらの手段は、ユーザーのセキュリティと利便性を高めるために、複数のAI技術と連携機能を組み合わせて実装される。データ処理装置とスマートデバイスの制御部が協力し、これらの手段を柔軟に実現するためのプラットフォームが提供される。各手段は、独立して機能するだけでなく、連動してより高度なサービスを提供するために設計される。ユーザーは、これらの手段を通じて、未登録または非通知の電話番号からの着信に対する対応を自動化し、日常生活の中で発生する潜在的なリスクから自己を守ることができる。また、これらの手段は、AI技術を中心に構築され、ユーザーのセキュリティを向上させると同時に、日常のコミュニケーションを効率化する。センサーを使用しないデータ収集の代替手段として、ユーザーが手動で情報を入力し、システムがその情報を利用するプロセスが可能である。ユーザーはこれらの手段によって、未登録または非通知の着信に対する対処を自動化し、不測の事態からの保護を強化することができる。
本形態例のシステムには、さらなる機能を追加し、その利便性を高めることができる。例えば、応答手段には、会話中にユーザーの感情を解析する機能を追加し、通話相手の声のトーンや話し方から感情を推定し、それに応じた応答をAIが行う。これにより、通話相手が怒りや不安を感じている場合は、より慎重かつ共感的な応答が可能となり、通話の品質を向上させる。
送信手段には、文書化した通話内容をユーザーのカレンダーやリマインダーに自動的に統合する機能を追加する。これにより、通話で得たアポイントメントやタスクを忘れずに管理できるようになり、生産性の向上を図ることができる。
確認手段は、ユーザーの生体情報を利用して本人確認を行う機能を備える。例えば、スマートウォッチやフィットネストラッカーから得た生体情報を用いて、通話相手が実際に家族や知人であることを確認する。これにより、より高度なセキュリティを実現しつつ、利用者の手間を減らすことができる。
接続手段には、配送業者がQRコード（登録商標）またはNFCタグをスキャンすることで認証を行い、通話を本人に直接繋ぐ機能を追加する。これにより、合言葉のやり取りを省略し、より迅速かつ安全に配送業者との連携を図ることが可能となる。
連携手段では、通話中の音声データから不審なキーワードを検出し、その内容を自動で分析する機能を追加する。キーワードに基づいて不審な通話と判断された場合、警察への通報を行う前に、まずはユーザーに警告することで、より正確な判断をサポートする。
これらの追加機能により、本形態例のシステムは、通話の自動応答だけでなく、日々の生活の中でのセキュリティやスケジュール管理をより効率的にサポートする。また、各種のAI技術とデータベースの連携によって、ユーザーの生活に更なる安心と便利さを提供する。さらに、これらの技術をユーザーのスマートデバイスや家庭内のIoT機器と統合することで、よりパーソナライズされた体験を実現することが可能となる。
本形態例のシステムには、さらなる機能拡張が可能であり、利便性およびセキュリティを高めるために応答手段には、発信者の声紋を分析し、登録済みの家族や知人の声と照合する機能を追加することができる。これにより、発信者が本人であるかをより正確に判断し、特定の質問や合言葉を使用しなくても安全に通話を継続することが可能になる。
送信手段では、音声認識と自然言語処理を活用して得られた通話内容を、ユーザーが選択した複数の言語に翻訳し、異なる言語を話すユーザー間のコミュニケーションを支援する機能を組み入れることができる。これにより、国際的なビジネスシーンや多言語を話す家庭内での利用が促進される。
確認手段には、ユーザーが特定のジェスチャーや動作をカメラに示すことで本人確認を行う機能を追加することができる。この機能により、通話中に手軽にかつ迅速に本人確認を行うことが可能となり、セキュリティが一層強化される。
接続手段に関しては、AIが発信者の位置情報と配送予定情報を照合し、本人確認を行う機能を追加することで、配送業者が実際に商品を配送している場所から通話していることを確認し、通話の真正性を保証することが可能になる。
連携手段では、不審な通話が検出された際に、警察だけでなく、ユーザーの指定した緊急連絡先にも自動通報する機能を追加することで、万が一の緊急時に迅速な対応が行えるようになる。
これらの機能追加は、既存のシステムの基本的な構造を維持しつつ、ユーザーのニーズに合わせて柔軟な対応が可能となる。また、スマートデバイスや家庭内のIoT機器と連携することで、ユーザーが日常的に使用する環境においてもシームレスな経験を提供する。これらの追加機能により、本形態例のシステムは通話の自動応答を超え、日常生活のあらゆる面でユーザーのセキュリティと利便性を向上させる。
（形態例２）
本発明を実施するための形態は、通話終了後に用件をチャットGPTで文書化し、メッセンジャーアプリやメールで送信する際に、家族宛に送信可能な送信手段を具備するシステムである。具体的には、通話終了後に生成された文書を選択し、送信先を家族と指定することで、家族宛に用件を送信することができる。
送信手段は、通話終了後に用件をチャットGPTで文書化し、メッセンジャーアプリやメールで家族宛に送信する手段として、データ処理装置１２の特定処理部２９０によって実現される。各手段と装置や制御部との対応関係は、上述した例に限定されず、種々の変更が可能である。
通話終了後の内容を文書化する手段は、音声認識技術を用いた会話内容解析手段を含む。この解析手段は、通話の音声データをテキストに変換し、通話の要点を抽出する。また、音声認識技術を用いた会話内容解析手段は、通話が終了した後に発生する音声データをテキストに変換し、そのテキストデータから通話の主要なポイントを抽出する機能を持つ。音声認識技術は、ディープラーニングに基づくモデルを活用し、さまざまな言語やアクセントに対応するためのトレーニングが行われる。また、この解析手段は、多様な言語やアクセントに対応するためにディープラーニングモデルを活用し、大量の音声データを基に学習を進める。会話内容解析手段は、形態素解析や構文解析を行い、会話の中で交わされた重要な情報や行動を要する内容を特定する。特に、キーワード抽出機能を用いて会話の中で頻繁に使われる単語やフレーズを特定し、それらを基に通話の要点を文書化する。また、会話の要点を明確にするために、形態素解析機能と構文解析機能を組み合わせ、会話中に交わされた重要な情報やアクションを要する内容を特定する。キーワード抽出機能は、会話の中で頻出する単語やフレーズを識別し、それらの情報を基に文書化する。
文書化された内容をメッセージとしてフォーマットする手段は、テキストエディタ機能を含む。この機能は、解析されたテキストを整理し、文書の構造を整える。メッセージフォーマット手段は、ユーザが容易に内容を確認し、必要に応じて編集や追加情報を加えることができるインタフェースを提供する。また、文書化された内容をメッセージ形式に整える手段には、テキストエディタ機能が含まれ、解析されたテキストを整理し、文書のレイアウトを整える。ユーザがメッセージの内容を確認し、編集や追加情報を加えることができるインターフェイスを提供する。
送信手段は、メッセージを宛先指定機能を用いて特定の家族に送信する。この宛先指定機能は、ユーザの連絡先リストと連携し、選択された家族メンバーの連絡先情報に基づいてメッセージを自動的に送信する。また、送信手段は、メッセージ送信の確認と送信履歴を管理する機能も備えており、ユーザは送信されたメッセージの状態を追跡し、必要に応じて再送信を行うことができる。メッセージの送信は、メッセンジャーアプリやメールアプリとの連携によって行われる。メッセンジャーアプリやメールアプリとの連携機能は、ユーザのアカウント情報を用いて認証を行い、安全にメッセージを送信する。また、送信手段は、セキュリティ対策としてメッセージの暗号化や送信時の認証プロセスを実施し、プライバシーの保護を確保する。また、送信手段は、ユーザが選択したメッセージ送信方法に応じて、メッセージを適切なフォーマットで送信する。例えば、メッセンジャーアプリではインスタントメッセージとして、メールアプリでは電子メールとして送信する。メッセージの送信手段は、ユーザが選択した家族メンバーに対してメッセージを送るための宛先指定機能を備えており、ユーザの連絡先リストと連携して、家族メンバーの連絡先に基づいたメッセージの自動送信を行う。送信手段には、メッセージの送信状況を確認し、送信履歴を管理する機能も含まれており、ユーザは送信したメッセージの状態を追跡し、必要に応じて再送信を行うことができる。メッセージの送信プロセスは、メッセンジャーアプリやメールアプリと連携し、ユーザのアカウント情報を基に認証を行い、安全にメッセージを送信する機能を持つ。送信手段は、メッセージの内容を暗号化し、送信時の認証を行うセキュリティ対策を施して、プライバシーを守る。また、メッセンジャーアプリではインスタントメッセージとして、メールアプリでは電子メールとして、ユーザが選択した送信方法に応じて適切なフォーマットでメッセージを送信する。
センサーを含まないデータ収集の例としては、ユーザが手動で通話内容をメモする場合が考えられる。この場合、ユーザは通話終了後に自分で通話の要点を記録し、そのテキストデータをメッセージとして家族に送信する。ユーザが手動で記録した通話内容は、テキストエディタ機能を用いて整理され、フォーマットされたメッセージとして送信手段によって家族宛に送信される。この手動記録は、音声認識技術を用いた自動文書化が適用できない状況や、ユーザが特定の情報を自らの言葉で伝えたい場合に適している。また、センサーを用いないデータ収集の方法として、ユーザが通話終了後に手動で通話内容をメモし、その情報をメッセージとして家族に送るシナリオも考えられる。この手動記録は、テキストエディタ機能を用いて整理され、フォーマットされたメッセージとして送信される。音声認識技術が適用できない状況や、ユーザが自らの言葉で特定の情報を伝えたい場合に利用される。
本発明の実施形態では、通話内容の文書化を超えた機能を考慮することができる。例えば、音声認識によって生成されたテキストに基づき、スケジュール管理システムと連携して、通話中に言及されたアポイントメントや予定を自動的にカレンダーに登録する機能を追加する。これにより、ユーザーは通話後に手動でスケジュールを管理する手間を省くことができる。さらに、家族間で共有されるカレンダーへの予定登録を提案し、家族全員が予定を共有しやすくする。
また、文書化されたテキストデータを基に、自動的にタスクリストを生成し、家族全員がアクセスできる共有プラットフォームに投稿する機能を設けることも可能である。このプラットフォームでは、各家族メンバーがタスクの進捗を更新したり、完了したタスクにチェックを入れることができ、家族全員で情報を共有し協力する環境を構築する。
さらに、文書化されたメッセージに対して、感情分析を行い、通話中の感情的なニュアンスをテキストに反映させる機能を追加することで、メッセージの受取人が発信者の意図をより正確に理解することを助ける。例えば、通話中に喜びや心配といった感情が表れた場合、その感情をテキストに特定の絵文字やフォーマットで表現し、コミュニケーションの豊かさを高める。
また、音声認識と解析を活用して、通話内容から自動的にFAQやよくある質問リストを生成し、家族が同様の問い合わせをする際に参照できる知識ベースを構築する機能も考えられる。この知識ベースは、家族内で共有され、新たな通話が発生するたびに更新されることで、家族間のコミュニケーションの効率を向上させる。
さらに、通話終了後に生成されるテキストは、メッセンジャーアプリやメールで送信するだけでなく、音声形式で再生する機能を付加することで、視覚障害のある家族メンバーや読み書きが苦手な子供でも情報を容易に受け取れるようにする。
最後に、通話終了後に文書化された内容を、家族メンバーのプライバシーを保護するために、文書内の機微な情報を識別し、自動的に匿名化や伏せ字処理を行う機能を組み込むことで、安心して情報を共有できる環境を提供する。これにより、個々のプライバシーを尊重しつつ、必要な情報のみを共有するバランスを保つことができる。
本発明の実施形態は、通話内容の自動文書化と送信に関するものであり、これに新たな機能を追加することが考えられる。例えば、通話内容を分析し、通話終了後に自動でアクションアイテムを生成し、対応が必要なタスクとしてユーザーのスマートデバイスにリマインダーをセットする機能が考えられる。このリマインダーは、通話で言及された期限や重要性に基づいて優先度を設定し、ユーザーが忘れずに行動に移せるようサポートする。
さらに、家族間でのコミュニケーションを強化するために、文書化されたメッセージ内の特定の単語やフレーズに基づいて、関連する画像やビデオ、リンクを自動的に添付する機能を追加することも有益である。これにより、テキストベースのメッセージだけでなく、視覚的な情報も共有でき、コミュニケーションがより豊かになる。
また、通話内容の文書化に際して、プライバシーに配慮し、特定の個人情報や機密情報を自動的に検出し、ブラー処理や伏せ字に変換する機能を実装することで、セキュリティを高めることができる。このプロセスは、自然言語処理技術とプライバシー保護のガイドラインに従って行われる。
通話内容のテキスト化では、ユーザーの多様なニーズに対応するため、複数の言語への翻訳機能を組み込むことも有効である。家族が異なる言語を話す多文化の環境では、通話内容を自動的に翻訳し、各メンバーが理解しやすい言語でメッセージを送信することが可能となる。
さらなる利便性を追求するために、メッセージの送信タイミングをユーザーがカスタマイズできるスケジュール機能を追加する。ユーザーは、即時送信だけでなく、特定の日時にメッセージを送信するよう設定できるため、家族が情報を受け取るタイミングを最適化できる。
最後に、メッセージの受け取り側で、受信したメッセージに対するアクションを簡単にとれるよう、返信や確認のためのクイックアクションボタンを設けることで、迅速なフィードバックと効率的なコミュニケーションを実現する。これにより、家族間での情報共有がさらにスムーズに行われるようになる。
（形態例３）
本発明を実施するための形態は、通話中に電話番号をチェックし、悪用歴のある番号の場合には警察に連携する際に、警察との連携手段を具備するシステムである。具体的には、通話中に着信番号をデータベースと照合し、悪用歴のある番号であることを検出した場合には、自動的に警察に通報する機能を備えている。
警察との連携手段は、通話中に着信番号をデータベースと照合し、悪用歴のある番号を検出した場合に自動的に警察に通報する機能として、データ処理装置１２の特定処理部２９０によって実現される。各手段と装置や制御部との対応関係は、上述した例に限定されず、種々の変更が可能である。
抽出手段は、通信網を介して発信される各通話の信号に含まれる発信者情報を抽出するために備わっており、デジタル信号処理技術を用いて通話データから発信者の電話番号を正確に取得することができる。また、抽出手段は、Caller ID情報を解析し、電話番号を特定するための信号解析機能を持っている。このシステムは、通話信号の中から発信者の電話番号を抽出するための信号抽出機能と、Caller ID情報の解読に特化した解析アルゴリズムを利用して、着信番号を正確に特定することもできる。
照合手段は、抽出された電話番号を不正利用が疑われる電話番号を収集し、カテゴリ別に整理したデータベースに照合する機能を有しており、データベース管理システムを通じてリアルタイムでの照合処理が可能であり、高速な検索アルゴリズムとインデックス技術により、通話が進行している間に迅速な照合が行われる。また、照合手段は、悪用歴のある番号をリスト化したデータベースを参照し、迅速な検索と照合を行うための高性能なデータベース検索機能とインデックス機能を備えている。
連携手段は、悪用歴のある番号が検出された場合に自動的に警察に通報する機能を持ち、通報システムとのインターフェイス機能が含まれ、通報する際の警察の受付システムとのプロトコルに基づいたデータ形式で通報情報を生成し、安全な通信チャネルを用いて警察の受付システムに送信する。通報情報の内容には、検出された悪用歴のある電話番号、通話の日時、通話の持続時間、発信者が利用している通信事業者などの情報が含まれ、個人情報の保護や通報の正確性を確保するために、暗号化技術や認証システムが用いられる。また、連携手段は、複数の通信プロトコルや通報システムとの互換性を持ち、システム間のデータ交換を円滑に行うためのアダプター機能を設け、通報のプロセスにおいて、警察の受付システムの要件に合わせて通報情報のフォーマットを調整し、適切な通報プロトコルを選択して通報を行う機能を持つ。通報プロセスが発動されると、システムは警察の受付システムに対して、通報情報を送信し、通報の受付確認を取得する。この確認は、通報が正しく行われたことをシステムが記録し、通報履歴として保存するための情報として利用される。通報履歴は、将来的な分析や改善のために用いられ、通報プロセスの効率化や精度向上に寄与する。この連携機能は、通報データ生成機能により、必要な情報を含む通報フォーマットを作成し、セキュアな通報送信機能を介して警察の受付システムへ情報を送信する。通報のセキュリティと信頼性を保証するため、データの暗号化機能とシステム認証機能が導入されている。また、システムの連携手段は、複数の通報プロトコルと互換性を持ち、異なる通報システムとのデータ交換を実現するアダプター機能を有しており、このアダプター機能は、通報プロトコル選択機能により、通報時のプロトコル要件に適した形式に自動的に調整し、通報情報の送信と受付確認を行う。通報履歴記録機能は通報の成功を記録し、システムのパフォーマンス分析や改善に使用される。
データ収集手段には、センサーを用いない例として、ユーザがアプリケーションやウェブインターフェース上で疑わしい通話に関する報告を行う機能があり、ユーザは、通話の経験や通話中に感じた不審な点をフォームに入力し、その情報がデータベースに登録される。この手動報告により収集されたデータは、自動的にデータベースに照合される電話番号のリストに追加される可能性があり、悪用歴のある番号の検出精度の向上に寄与する。また、センサーを使用しないデータ収集例として、ユーザが疑わしい通話について報告するための入力機能が提供される。ユーザはインタラクティブな報告フォームを通じて、疑わしい通話の内容をデータベースに登録し、これにより収集された情報は悪用歴のある番号の検出に活用される。この手動報告システムは、ユーザの経験と感覚に基づいて追加データを提供し、番号照合データベースの拡張に貢献する。
このシステムには、データベースの更新メカニズムを強化する機能を追加することができる。例えば、新たに悪用が確認された番号は、通報後も自動的にデータベースに追加される。さらに、疑わしい通話が報告された際には、その番号の信用情報を他のデータベースとも照合し、ユーザーからの報告に基づく情報と組み合わせることで、より正確な悪用歴の特定を実現する。データベースの整合性を保つために、定期的なクリーニングプロセスを実行し、誤った情報や古いデータを排除する仕組みも設けられる。また、通報システムとの連携を強化するため、警察が提供する犯罪データベースと直接連携し、照合プロセス中にリアルタイムで犯罪情報を取得し、照合結果の精度を向上させる機能も導入される。
通報の即時性を高めるために、通話が開始された瞬間に照合プロセスが開始され、悪用歴のある番号が検出された場合、通話者に警告音を出すか、自動的に通話を遮断するオプションも設けられる。さらに、通話を遮断した際には、通報者に代わって通話内容の録音を保存し、警察の調査に役立てることができる。警察が介入する際には、通報者の位置情報や通話履歴を含む詳細なレポートが自動生成され、犯罪捜査の迅速化を支援する。
ユーザインターフェースには、通報システムの透明性を高めるために、通報プロセスの進行状況をリアルタイムで確認できる機能が追加される。通報の結果や警察からのフィードバックをユーザが確認できるようにすることで、システムへの信頼性を向上させる。また、悪用歴のある番号に関する統計データやトレンド分析を提供し、ユーザが通話に対する警戒心を持つための情報提供も行われる。
さらに、悪用歴のある番号を特定するための機械学習技術を導入し、通話パターンや通話の頻度などの様々な指標を分析することで、悪用の可能性が高い新たな番号を予測する。これにより、データベースの予防的な更新が可能となり、未知の犯罪行為を防ぐための対策を強化する。また、ユーザが通報システムの効果について直接フィードバックを提供できる機能を設け、システムの改善に役立てる。フィードバックは匿名で行われることで、ユーザのプライバシーを保護しつつ、システムの改善に資する貴重な情報を収集する。
通話中の電話番号チェックをより効果的にするためには、ユーザーが直面する可能性のある様々な詐欺のパターンをAIが学習し、特定の単語やフレーズが通話中に検出された際にリアルタイムでフラグを立てる機能を実装する。これにより、単に電話番号がデータベース内の悪用歴と一致するだけでなく、通話の内容からも悪意を持った行動を推測し、検出することが可能になる。また、ユーザーが詐欺を疑う通話を簡単に報告できるショートカットやボタンをスマートフォンのインターフェースに設け、報告プロセスを簡略化する。これにより、データベースはより迅速に更新され、他のユーザーに対する保護が向上する。
警察との連携を強化するためには、通報された情報を基に警察が迅速に対応できるよう、通報システムに位置情報追跡機能を統合し、犯罪者の追跡と捕捉を支援する。また、通報システムに組み込まれる人工知能は、通報データから犯罪パターンを分析し、予防的な警戒活動を計画するための情報を警察に提供する。このような予測分析を活用することで、将来的な犯罪を未然に防ぐことに繋がる。
警察とのデータ共有を促進するために、警察が把握している詐欺事件やその他の犯罪に関する情報をリアルタイムで受け取り、データベースを更新する機能を設ける。これにより、通報システムは最新の犯罪情報に基づいて機能し、ユーザーを守るための対策が強化される。さらに、システムにはブロックリスト機能を追加し、ユーザーが自身で疑わしい番号を登録して通話を拒否できるようにする。これにより、ユーザー自身が直接リスクをコントロールすることが可能になる。
教育プログラムとして、ユーザーが詐欺の手口を認識し、予防するための情報を提供するオンライン講座やワークショップを開催する。これにより、ユーザーは自分自身を守るための知識を得ることができ、社会全体のセキュリティ意識が向上する。また、通報システムの利用によって防がれた詐欺事件の事例を共有し、ユーザーがシステムの実効性を理解しやすくする。
最後に、システムのアップデートを通じて、通話が詐欺である可能性が高いと判断された場合に、ユーザーに自動的に警告メッセージを送信し、詐欺に対する警戒を促す機能を実装する。これにより、ユーザーは即座に詐欺である可能性を認識し、適切な対応を取ることができるようになる。 (Example 1)
The embodiment of the present invention is a system that includes a response means in which a generative AI responds when a call is received from an unregistered or withheld phone number, and a sending means that documents the matter in chat GPT after the call ends and sends it via a messenger app or email. In addition, the system includes a confirmation means that uses a special question or password to confirm if the caller claims to be a family member or acquaintance, a connection means that connects the delivery company to the person using a password for the delivery company, and a linking means that checks the phone number during the call and links to the police if the number has a history of abuse.
The response means is realized by the specific processing unit 290 of the data processing device 12, and the generative AI responds to an incoming call from an unregistered or unnotified phone number. The transmission means is realized by the specific processing unit 290 of the data processing device 12 as a means for documenting the matter in chat GPT after the call is ended and sending it by messenger app or email. The confirmation means is realized by, for example, the control unit 46A of the robot 414, and confirms family and acquaintances using dedicated questions and passwords. The connection means is realized by the control unit 46A of the robot 414 as a means for a delivery company to connect to the person using a password for the delivery company. The linking means is realized by the specific processing unit 290 of the data processing device 12 as a means for checking the phone number and linking to the police if the number has a history of misuse. The correspondence between each means and the device or control unit is not limited to the above example, and various changes are possible.
The response means includes a speech understanding function that utilizes voice recognition technology, and responds to calls from unregistered or withheld numbers in natural language generated by a pre-trained AI. This AI has learned a large amount of call data and has the ability to generate responses that can respond to various scenarios. When a call is received, the AI analyzes the content of the call in real time and selects an appropriate response to proceed with the conversation. The response content is customized to the nature of the call, so questions from sales calls are answered politely and suspicious calls are answered carefully. In addition, the response means can also use speech synthesis technology and speech content understanding technology to respond to calls from unregistered or withheld phone numbers, and a pre-trained AI can respond appropriately. This AI can learn multiple response scenarios, understand the intention of the call, and provide a natural dialogue according to the situation. The AI instantly selects a response based on the content of the call and responds customized to the purpose of the call, so inquiries from sales calls can be handled appropriately and suspicious calls can be answered carefully.
The transmission means includes voice recognition technology that converts the contents of the call into text and natural language generation technology that generates a summary of the call based on that text. When the call ends, the AI converts the contents of the call into text data and summarizes and documents the contents. The generated document is sent to the user via a communication means such as a messenger app or email. This process allows the user to check the details of the call at a later date and ensures that important information is not overlooked. The transmission means can also convert the contents of the completed call into text using voice recognition technology, then summarize it using natural language processing technology, and send the generated document to the user via a messenger app or email. The user can check the details of the call at any time through the sent document, and important information is recorded and not overlooked.
The verification means has the function of verifying the identity of the caller by using specific questions or passwords when the caller claims to be a family member or acquaintance. In this method, the AI uses questions and passwords set by the user, and verifies the identity of the caller by providing the correct response. If identity verification is successful, the AI continues the call and transfers the call to the user if necessary. This process prevents unauthorized access and fraud while ensuring smooth communication if the caller is the real person. The verification means can also have the AI present questions and passwords set by the user to callers claiming to be family members or acquaintances, and continue the call or transfer the call to the user only if an accurate answer is given. This function prevents unauthorized access and fraud, while enabling smooth communication if the caller is confirmed to be the real person.
The connection means has a function that allows a person in a specific occupation, such as a delivery worker, to talk to the person by using a secret code for the occupation. With this method, AI receives the secret code for the occupation, verifies that it is the correct code, and then connects the call to the person. The secret code for the occupation is important for ensuring the security of the call, and if an incorrect code is entered, the call will not be connected. The connection means can also have a function where AI receives a secret code specific to the occupation, such as a delivery worker, verifies its validity, and then connects the call to the user. The security of the call is ensured by this secret code, and if the incorrect code is entered, the call will be disconnected.
The linking means has a function of checking the phone number in real time during a call, and automatically linking to the police or related organizations if the number has a history of misuse. This means refers to a list of suspicious numbers stored in a database and checks whether the incoming number is included in the list. If it is determined that the call is from a suspicious number, the system immediately starts a reporting protocol to the police and takes appropriate action depending on the situation. The linking means can also monitor the number being called in real time, and automatically link to related organizations such as the police if the number has a history of misuse in the past. When a suspicious number is detected, a reporting protocol to the police is activated and a prompt response is taken.
These measures are implemented by combining multiple AI technologies and collaboration functions to enhance user security and convenience. The data processing device and the control unit of the smart device cooperate to provide a platform for flexibly realizing these measures. Each measure is designed not only to function independently but also to work together to provide more advanced services. Through these measures, users can automate responses to calls from unregistered or unnotified phone numbers and protect themselves from potential risks that occur in daily life. In addition, these measures are built around AI technology to improve user security while simultaneously streamlining daily communication. As an alternative to sensor-free data collection, a process in which users manually enter information and the system uses that information is possible. Through these measures, users can automate responses to unregistered or unnotified calls and strengthen protection from unforeseen events.
Further functions can be added to the system of this embodiment to enhance its convenience. For example, the response means can be added with a function to analyze the user's emotions during a conversation, and the AI can infer the emotions from the tone of the other party's voice and manner of speaking and respond accordingly. This allows for a more careful and empathetic response when the other party is feeling angry or anxious, improving the quality of the call.
The sending method will add a feature that automatically integrates documented calls into users' calendars and reminders, helping them remember appointments and tasks generated during calls and improving productivity.
The verification method has a function to verify the identity of the user using biometric information. For example, biometric information obtained from a smartwatch or fitness tracker can be used to verify that the person on the other end of the line is actually a family member or acquaintance. This reduces the hassle for users while achieving a higher level of security.
The connection method will add a function that allows the delivery company to scan a QR code (registered trademark) or NFC tag to authenticate the user and connect the call directly to the user, eliminating the need for a password and making it possible to communicate with the delivery company more quickly and safely.
The collaboration will add a function to detect suspicious keywords from voice data during phone calls and automatically analyze the content. If a call is deemed suspicious based on the keywords, the system will first warn the user before reporting it to the police, helping them make more accurate decisions.
With these additional functions, the system of this embodiment not only automatically answers calls, but also more efficiently supports security and schedule management in daily life. In addition, by linking various AI technologies with databases, it provides greater security and convenience to the user's life. Furthermore, by integrating these technologies with the user's smart devices and IoT devices in the home, it becomes possible to realize a more personalized experience.
The system of this embodiment can be further expanded, and in order to increase convenience and security, the response means can be added with a function to analyze the caller's voiceprint and compare it with the voices of registered family members and acquaintances. This makes it possible to more accurately determine whether the caller is the real person and to continue the call safely without using specific questions or passwords.
The transmission means can incorporate a function that uses voice recognition and natural language processing to translate the contents of the call into multiple languages selected by the user, facilitating communication between users who speak different languages, facilitating use in international business situations and within multilingual households.
The verification method can include a feature that allows users to verify their identity by making specific gestures or movements on the camera, making it possible to easily and quickly verify the identity of a user during a call, further enhancing security.
Regarding the means of connection, AI will be able to add a function to verify the identity of the caller by comparing their location information with delivery schedule information, making it possible to confirm that the delivery company is calling from a location where the goods are actually being delivered, thereby guaranteeing the authenticity of the call.
The collaboration method will add a function that automatically notifies not only the police but also the user's designated emergency contact when a suspicious call is detected, enabling a rapid response in the event of an emergency.
These additional functions allow for flexible response to user needs while maintaining the basic structure of the existing system. In addition, by linking with smart devices and IoT devices in the home, the system provides a seamless experience in the environment in which the user uses the system on a daily basis. With these additional functions, the system of this embodiment goes beyond automatic answering of calls, improving the security and convenience of the user in all aspects of daily life.
(Example 2)
The embodiment of the present invention is a system that has a sending means that can send the matter to family members when documenting the matter in chat GPT after the call ends and sending it by messenger app or email. Specifically, the matter can be sent to family members by selecting the document created after the call ends and specifying "family members" as the destination.
The sending means is realized by the specific processing unit 290 of the data processing device 12 as a means for documenting the matter in chat GPT after the call ends and sending it to the family by messenger app or e-mail. The correspondence between each means and the device or control unit is not limited to the above example, and various changes are possible.
The means for documenting the contents after the end of the call includes a conversation content analysis means using a voice recognition technology. This analysis means converts the voice data of the call into text and extracts the main points of the call. In addition, the conversation content analysis means using the voice recognition technology has a function of converting the voice data generated after the end of the call into text and extracting the main points of the call from the text data. The voice recognition technology utilizes a model based on deep learning and is trained to support various languages and accents. In addition, in order to support various languages and accents, this analysis means utilizes a deep learning model and proceeds with learning based on a large amount of voice data. The conversation content analysis means performs morphological analysis and syntactic analysis to identify important information exchanged in the conversation and content requiring action. In particular, a keyword extraction function is used to identify words and phrases frequently used in the conversation, and the main points of the call are documented based on these. In addition, in order to clarify the main points of the conversation, a morphological analysis function and a syntactic analysis function are combined to identify important information exchanged during the conversation and content requiring action. The keyword extraction function identifies words and phrases that appear frequently in the conversation, and documents based on this information.
The means for formatting the documented content into a message includes a text editor function, which organizes the parsed text and arranges the document structure. The message formatting means provides an interface that allows a user to easily check the content and add edits or additional information as necessary. The means for formatting the documented content into a message also includes a text editor function, which organizes the parsed text and arranges the document layout. The means for formatting the message provides an interface that allows a user to check the content of the message and add edits or additional information.
The sending means sends a message to a specific family member using a destination designation function. This destination designation function works in conjunction with the user's contact list to automatically send a message based on the contact information of the selected family member. The sending means also has a function of confirming message transmission and managing a transmission history, so that the user can track the status of the sent message and resend it as necessary. The message is sent in conjunction with a messenger app or an email app. The linking function with the messenger app or the email app performs authentication using the user's account information and transmits the message safely. The sending means also encrypts the message as a security measure and performs an authentication process at the time of transmission to ensure privacy protection. The sending means also transmits the message in an appropriate format depending on the message transmission method selected by the user. For example, the message is sent as an instant message in the messenger app, and as an email in the email app. The message sending means has a destination designation function for sending a message to a family member selected by the user, and works in conjunction with the user's contact list to automatically send a message based on the contact information of the family member. The sending means also includes a function for checking the sending status of a message and managing the sending history, so that the user can track the status of the sent message and resend it if necessary. The message sending process has a function for linking with a messenger application or an email application, authenticating the user based on the account information, and sending the message securely. The sending means protects privacy by encrypting the contents of the message and implementing security measures such as authentication at the time of sending. In addition, the message is sent in an appropriate format according to the sending method selected by the user, such as an instant message in the messenger application or an email in the email application.
An example of data collection that does not include sensors is when a user manually takes notes on a phone call. In this case, the user records the main points of the call after the call ends, and sends the text data to the family as a message. The call contents manually recorded by the user are organized using a text editor function and sent to the family as a formatted message by the sending means. This manual recording is suitable for situations where automatic documentation using voice recognition technology cannot be applied, or when the user wants to convey specific information in his or her own words. Another possible method of data collection that does not include sensors is a scenario in which a user manually takes notes on a phone call after the call ends, and sends the information to the family as a message. This manual recording is organized using a text editor function and sent as a formatted message. This is used in situations where voice recognition technology cannot be applied, or when the user wants to convey specific information in his or her own words.
In an embodiment of the present invention, functions beyond documenting the contents of a call can be considered. For example, a function can be added that automatically registers appointments and events mentioned during a call in a calendar in cooperation with a schedule management system based on the text generated by speech recognition. This can save the user the trouble of manually managing the schedule after the call. In addition, it can suggest adding events to a calendar shared among family members, making it easier for all family members to share the schedule.
It is also possible to automatically generate task lists based on documented text data and post them to a shared platform that can be accessed by all family members, where each family member can update the progress of tasks and check off completed tasks, creating an environment for the whole family to share information and cooperate.
In addition, the company will add a feature that performs sentiment analysis on written messages and reflects the emotional nuances expressed during a call in the text, helping message recipients to more accurately understand the caller's intent. For example, if emotions such as joy or anxiety are expressed during a call, those emotions will be expressed in the text with specific emojis and formats, enhancing the richness of communication.
Another possible function would be to use voice recognition and analysis to automatically generate FAQs and a list of frequently asked questions from the content of calls, building a knowledge base that family members can refer to when they have similar inquiries. This knowledge base would be shared among family members and updated every time a new call occurs, improving the efficiency of communication between family members.
In addition, the text generated after the call ends can be sent not only via messenger apps or email, but also played back in audio format, making it easier for visually impaired family members or children with difficulty reading and writing to receive the information.
Finally, to protect the privacy of family members, the documented content after the call is completed can be automatically anonymized or masked to identify sensitive information in the document, providing an environment in which information can be shared with peace of mind. This allows for a balance between respecting each individual's privacy and sharing only the information that is necessary.
The embodiment of the present invention relates to automatic documentation and transmission of call contents, and new functions can be added to the document. For example, the document can be analyzed, and action items can be automatically generated after the call ends, and a reminder can be set on the user's smart device as a task that needs to be addressed. The reminder can be prioritized based on deadlines and importance mentioned in the call, helping the user to take action without forgetting.
Additionally, to enhance communication among family members, it would be beneficial to add the ability to automatically attach relevant images, videos, and links based on specific words or phrases in a written message, allowing for visual information to be shared in addition to text-based messages, making communication richer.
In addition, the system can enhance security by automatically detecting and blurring certain personal or confidential information when documenting call transcripts, in accordance with privacy protection guidelines and natural language processing techniques.
When converting call contents into text, it is also effective to incorporate a translation function into multiple languages to meet the diverse needs of users. In a multicultural environment where family members speak different languages, it is possible to automatically translate the call contents and send messages in a language that each member can easily understand.
To further enhance convenience, a schedule function will be added that allows users to customize the timing of message sending. Users can set messages to be sent at a specific date and time, rather than just immediately, allowing them to optimize the timing at which their family members receive information.
Finally, quick action buttons for replying or confirming messages have been provided so that recipients can easily take action on received messages, enabling quick feedback and efficient communication, making information sharing between family members even smoother.
(Example 3)
The embodiment of the present invention is a system that checks phone numbers during a call, and has a means for linking with the police when linking with the police if the number has a history of misuse. Specifically, the system is equipped with a function that checks the incoming number against a database during a call, and automatically notifies the police if it detects that the number has a history of misuse.
The means for coordinating with the police is realized by the specific processing unit 290 of the data processing device 12 as a function for checking the incoming number against a database during a call and automatically reporting to the police if a number with a history of misuse is detected. The correspondence between each means and the device or control unit is not limited to the above example, and various modifications are possible.
The extraction means is provided for extracting caller information contained in the signal of each call sent through the communication network, and can accurately obtain the caller's telephone number from the call data using digital signal processing technology. The extraction means also has a signal analysis function for analyzing Caller ID information and identifying the telephone number. The system can also accurately identify the called number by utilizing the signal extraction function for extracting the caller's telephone number from the call signal and an analysis algorithm specialized in decoding Caller ID information.
The matching means has a function of matching the extracted telephone number with a database that collects telephone numbers suspected of fraudulent use and organizes them by category, and the matching process can be performed in real time through a database management system, and rapid matching can be performed while the call is in progress using high-speed search algorithms and indexing technology.The matching means also has a high-performance database search function and indexing function for rapid search and matching by referring to a database that lists numbers with a history of misuse.
The linking means has a function of automatically reporting to the police when a number with a history of abuse is detected, includes an interface function with the reporting system, generates report information in a data format based on the protocol with the police reception system when reporting, and transmits it to the police reception system using a secure communication channel. The contents of the report information include information such as the detected abused phone number, the date and time of the call, the duration of the call, and the telecommunications carrier used by the caller, and uses encryption technology and authentication systems to protect personal information and ensure the accuracy of the report. In addition, the linking means is compatible with multiple communication protocols and reporting systems, has an adapter function for smooth data exchange between the systems, and has a function of adjusting the format of the report information to meet the requirements of the police reception system in the reporting process, and selecting an appropriate reporting protocol to make the report. When the reporting process is activated, the system transmits report information to the police reception system and obtains a report reception confirmation. This confirmation is used as information for the system to record that the report was made correctly and store it as a report history. The report history is used for future analysis and improvement, contributing to the efficiency and accuracy of the report process. This linking function uses a report data generation function to create a report format containing the necessary information, and transmits the information to the police reception system via a secure report transmission function. Data encryption and system authentication functions are implemented to ensure the security and reliability of reports. In addition, the system's linking means has an adapter function that is compatible with multiple report protocols and enables data exchange with different report systems, and this adapter function automatically adjusts to a format suitable for the protocol requirements at the time of reporting using a report protocol selection function, and transmits report information and confirms receipt. The report history recording function records the success of reports, which is used to analyze and improve system performance.
As an example of data collection without sensors, a function for users to report suspicious calls on an application or web interface is provided. The user enters their experience of the call and any suspicious points they noticed during the call into a form, and the information is registered in a database. The data collected through this manual report can be added to a list of phone numbers that are automatically matched in the database, contributing to improved accuracy in detecting numbers with a history of abuse. As an example of data collection without sensors, an input function is provided for users to report suspicious calls. Through an interactive reporting form, users register the contents of suspicious calls in a database, and the information collected is used to detect numbers with a history of abuse. This manual reporting system contributes to the expansion of the number matching database by providing additional data based on the user's experience and intuition.
The system can be equipped with a function to strengthen the database update mechanism. For example, newly confirmed abused numbers will be automatically added to the database even after they are reported. In addition, when a suspicious call is reported, the credit information of the number will be checked against other databases and combined with information based on user reports to more accurately identify abuse history. To ensure the integrity of the database, a regular cleaning process will be carried out to eliminate incorrect and outdated information. In addition, to strengthen cooperation with the reporting system, a function will be introduced to directly link with the crime database provided by the police, obtain crime information in real time during the matching process, and improve the accuracy of the matching results.
To improve the immediacy of reports, the matching process begins the moment the call is initiated, and if a number with a history of abuse is detected, the caller will be given the option to sound an alarm or automatically hang up. In addition, when the call is hung up, a recording of the call can be saved on the caller's behalf to assist police investigations. When police intervene, a detailed report including the caller's location and call history is automatically generated, helping to speed up criminal investigations.
The user interface will be updated to include a feature that allows users to check the progress of the reporting process in real time to increase transparency in the reporting system. Users will be able to check the results of their reports and feedback from the police, which will increase trust in the system. The system will also provide statistical data and trend analysis on abused numbers, providing information to users to be cautious about calls.
Furthermore, machine learning technology will be introduced to identify numbers with a history of abuse, and new numbers likely to be abused will be predicted by analyzing various indicators such as call patterns and call frequency. This will enable proactive updates to the database, strengthening measures to prevent unknown criminal activity. A function will also be added that allows users to provide direct feedback on the effectiveness of the reporting system, which will help improve the system. Feedback will be provided anonymously, protecting user privacy while collecting valuable information that will contribute to improving the system.
To make the phone number check during a call more effective, AI will learn the patterns of various fraudulent scams that users may face and implement a function to flag in real time when certain words or phrases are detected during a call. This will allow malicious behavior to be inferred and detected from the content of the call, rather than simply matching the phone number with abuse history in the database. In addition, the reporting process will be simplified by providing shortcuts and buttons on the smartphone interface that allow users to easily report calls that they suspect are fraudulent. This will allow the database to be updated more quickly and improve protection for other users.
To strengthen cooperation with the police, the reporting system will be integrated with location tracking capabilities to help police respond quickly based on reported information, helping them track and capture criminals. Artificial intelligence will also be built into the reporting system to analyze crime patterns from report data and provide police with information to plan preventive vigilance activities. Using such predictive analytics will help prevent future crimes.
To facilitate data sharing with the police, the system will receive real-time information on fraud cases and other crimes known to the police and update the database. This will ensure that the reporting system operates based on the latest crime information and strengthen measures to protect users. In addition, the system will be equipped with a block list function, allowing users to register suspicious numbers and block calls from them. This will allow users to directly control the risks themselves.
Educational programs will include online courses and workshops to provide users with information to recognize and prevent fraud methods. This will provide users with the knowledge to protect themselves and raise security awareness in society as a whole. Examples of fraud cases that were prevented by using the reporting system will also be shared to help users understand the effectiveness of the system.
Finally, through a system update, if a call is deemed likely to be fraudulent, a warning message will be automatically sent to the user to warn them against fraud. This will allow users to immediately recognize the possibility of fraud and take appropriate action.

（形態例１）
本発明を実施するための形態は、未登録または非通知の電話番号からの着信時に、生成系AIが応答する応答手段と、通話終了後に用件をチャットGPTで文書化し、メッセンジャーアプリやメールで送信する送信手段とを具備するシステムである。さらに、ユーザの感情を認識する感情エンジンを組み合わせる組み合わせ手段として、通話中にユーザの声のトーンや言葉の選択などを分析し、感情を推定する。推定された感情に基づいて、生成系AIの応答や文書化された用件の表現を調整する。例えば、ユーザが不安な感情を示している場合には、より穏やかな表現を使用することで安心感を与える。
応答手段は、データ処理装置１２の特定処理部２９０によって実現され、未登録または非通知の電話番号からの着信に対して生成系AIが応答する。送信手段は、通話終了後に用件をチャットGPTで文書化し、メッセンジャーアプリやメールで送信する手段として、データ処理装置１２の特定処理部２９０によって実現される。感情認識手段は、通話中にユーザの声のトーンや言葉の選択を分析し、感情を推定する手段として、データ処理装置１２の特定処理部２９０またはロボット４１４の制御部４６Ａによって実現される。各手段と装置や制御部との対応関係は、上述した例に限定されず、種々の変更が可能である。
応答手段は、自然言語処理と音声合成を組み合わせた対話生成手段が含まれる。この対話生成手段は、未登録または非通知の着信に対して自動で応答を行い、ユーザとの対話を可能にする。未登録または非通知の着信を検出すると、生成系AIは事前に訓練された会話モデルを用いて応答する。この会話モデルは、様々なシナリオに対応できるように多様な対話データを基に学習しており、着信者の質問や要求に対して適切な回答や案内を提供する。また、応答手段は、AIによる対話生成機能を備え、未登録または非通知の電話番号からの着信に対して、自動的に適切な返答を行うことができる。この機能は、通話の初期段階で着信者の目的や要望を理解し、対応する応答を行うための対話管理機能と、生成された応答を自然な音声に変換することもできるための音声生成機能を組み合わせたものである。対話管理機能は、特定のキーワードやフレーズの検出に基づいて着信者の意図を分析し、適切な返答を生成することができる。音声生成機能は、テキストベースの応答をリアルタイムに音声に変換し、着信者に対して自然な会話体験を提供することができる。
送信手段には、チャットボットや自然言語理解技術を用いた文書化手段が含まれる。通話終了後、チャットGPTのような高度な自然言語理解モデルを使用して、通話内容を精確に文書化する。文書化手段は、通話の要点を抽出し、要約する能力を有しており、用件を簡潔かつ明瞭に伝える文書を生成することができる。生成された文書は、メッセンジャーアプリやメールを介してユーザに送信される。この過程には、ユーザのメールアドレスやメッセンジャーアカウントへの連携機能が含まれ、文書は適切な形式で自動的に送付される。また、送信手段は、通話内容をテキスト化し、これをユーザがアクセス可能な形で提供することができる。通話が終了すると、通話内容文書化機能が活動し、会話の主要なポイントを抽出し、要約することができる。この要約されたテキストは、自動配信機能を通じてユーザ指定のメールアドレスやメッセンジャーアプリに送信される。この自動配信機能には、文書を適切なフォーマットで整え、指定された送信先に確実に届けるためのメール送信機能やアプリ連携機能が含まれる。
感情認識手段には、音声分析を行う声紋解析手段と、言葉の選択から感情を推定する言語解析手段が含まれる。声紋解析手段は、スマートデバイスのマイクロフォンを利用してユーザの声のトーン、ピッチ、速度などの特徴を検出し、それらの声の特性からユーザの感情状態を推定することができる。言語解析手段は、ユーザの発言の内容を解析し、使用される言葉やフレーズから感情的なコンテキストを抽出することができる。これらの分析結果は、生成系AIが応答を行う際や、文書化された用件の表現を調整する際に使用される。例えば、ユーザが不安を示している場合、応答や文書の表現はより穏やかで安心感を与えるように調整される。また、感情認識手段は、通話中のユーザの声の特徴と言語使用から感情を推定する機能を有する。声紋分析機能は、マイクロフォンによって収集された音声データから、声の高低、強弱、速度などの特徴を抽出し、これらの特性を解析することで感情を推定することができる。言語感情分析機能は、通話中の言語データを処理し、使用される単語やフレーズが持つ感情的な意味を解析し、ユーザの感情状態を把握することができる。これらの分析結果は、AIが行う応答のトーンや、文書化された通話内容の表現を調整する際に利用され、ユーザに対して適切な感情的対応を提供することができる。
センサーを含まないデータ収集手段としては、ユーザが自身で入力するテキストデータや、システム利用に関するフィードバックが挙げられる。これらは、ユーザ入力受付機能やフィードバック収集機能を通じてシステムに提供され、サービス改善のための貴重な情報源として活用される。
これらの手段は、ユーザの要求に迅速かつ効果的に対応し、コミュニケーションの質を向上させることを目的としている。また、各手段の実装は、データ処理装置やスマートデバイスの制御部によって柔軟に行われ、システムの効率性とユーザビリティを高めるために様々な形で変更が可能となっている。
本発明のシステムは、追加機能として、未登録または非通知の着信に対して、応答前に通話者の意図を推測するための概要予測手段を備えることができる。これにより、応答手段がより精度の高い対話を生成し、ユーザーにとって有意義なやり取りが実現する。また、生成系AIが応答する際には、通話者の国や地域に基づいた言語選択機能を持たせ、多言語対応の自動応答が可能となる。
送信手段に関しては、通話内容の文書化に加えて、重要なキーワードやフレーズのハイライト機能を設けることで、ユーザーが文書を素早く把握できるようにする。さらに、文書化された内容に基づいて自動的にアクションアイテムを生成し、ユーザーのタスクリストに追加する機能を追加することができる。
感情認識手段においては、通話中にユーザーの感情が変化した場合、その変化をリアルタイムで検知し、応答手段の対話のトーンやテンポを動的に調整する機能を持たせることができる。また、特定の感情が検出された場合には、それに応じた特別なサポートやアドバイスを提供する専門家への連絡を促すプロトコルも組み込むことができる。
応答手段には、着信者の過去の通話履歴や関連データを分析し、より個人化された応答を提供するパーソナライゼーション機能を追加することができる。これにより、ユーザーにとってより関連性が高い情報を提供し、応答の有用性を高めることが可能となる。
送信手段に関しては、文書化された通話内容に基づいてフォローアップのアクションを提案する機能を追加することができる。例えば、通話内容に含まれるタスクや予定に対して、カレンダーアプリへの自動登録機能を統合することで、ユーザーの時間管理をサポートする。
さらに、感情認識手段は、ユーザーのストレスレベルや緊張感を検知し、適宜、ストレス軽減のためのアドバイスやリラクゼーションコンテンツへのリンクを提供する機能を備えることができる。これにより、ユーザーの精神的な健康を支援し、総合的なウェルビーイングを促進することができる。
本形態例のシステムには、さらなる機能向上を図るための複数の追加機能が考慮される。例えば、未登録または非通知の着信に対して、通話者の声紋を分析し、以前の通話データと照合することにより、通話者の身元を特定する声紋認識手段を追加することができる。これにより、通話者が過去にシステムとやり取りしたことがある場合、その情報を基に応答手段がより適切な対応を行うことが可能となる。また、声紋認識手段は、セキュリティ対策としても機能し、ユーザーに対する信頼性の高い通話体験を提供する。
送信手段についても、通話内容を文書化する際に、通話の内容を構造化し、情報の重要度に応じてテキストの階層化を行う機能を考慮する。これにより、ユーザーは文書を読む際に重要な情報をより迅速に把握できるようになる。さらに、文書化された内容をユーザーの好みや過去の行動パターンに合わせてカスタマイズすることで、より個人的な体験を提供することが可能となる。
感情認識手段では、通話中にユーザーのストレスレベルを検知し、ストレスが高いと推定される場合には、通話内容に関連したリラクゼーション方法や心理的サポートへの案内を提供する。これにより、ユーザーが通話を通じてリラックスし、ストレスを軽減できるようなサービスを提供する。
さらに、応答手段には、通話内容に基づいてユーザーへのフォローアップアクションを自動的に提案する機能を追加する。例えば、通話中に提案された製品やサービスに関する追加情報へのリンクを提供したり、次の行動ステップを提案することで、ユーザーの意思決定をサポートする。
応答手段の改善としては、通話者の意図に応じて自動的に応答スタイルを変更する機能を検討する。たとえば、通話者が緊急の状況を示している場合には、迅速かつ的確な指示を提供するようにAIを調整する。このような対応により、通話者のニーズに即応できるようなシステムを実現する。
これらの追加機能は、ユーザーエクスペリエンスの向上を目指すとともに、通話内容の正確な把握と迅速な対応を可能にするためのものである。また、それぞれの機能は、データ処理装置やスマートデバイスの制御部の能力を最大限に活用し、システムの有用性をさらに高めることが期待される。
（形態例２）
本発明を実施するための形態は、通話終了後に用件をチャットGPTで文書化し、メッセンジャーアプリやメールで送信する際に、ユーザの感情を認識する感情エンジンを使用して文書の表現を調整する調整手段とを具備するシステムである。具体的には、文書化された用件を感情エンジンに入力し、ユーザの感情を分析する。分析結果に基づいて、文書の表現を適切に調整する。例えば、ユーザが喜びや興奮を感じている場合には、より明るく活気のある表現を使用することで、ユーザの感情を共有する。
調整手段は、通話終了後に用件をチャットGPTで文書化し、メッセンジャーアプリやメールで送信する際に、ユーザの感情を認識する感情エンジンを使用して文書の表現を調整する手段として、データ処理装置１２の特定処理部２９０によって実現される。文書化された用件を感情エンジンに入力し、ユーザの感情を分析する処理は、特定処理部２９０が実行する特定処理プログラム５６に従って行われる。各手段と装置や制御部との対応関係は、上述した例に限定されず、種々の変更が可能である。
会話内容抽出手段は、音声認識技術を駆使して通話内容をテキストデータに変換する。また、会話内容抽出手段は、音声信号を受信した後、ノイズリダクション手段を用いて背景雑音を除去し、音声からテキストへの変換精度を向上させる。さらに、会話内容抽出手段は、最先端の音声認識技術を用いて、通話内容を精確にテキストデータへ変換し、その過程でノイズ除去手段が背景雑音を除去し、変換精度を向上させる。
テキスト処理手段は、変換されたテキストデータの構文上の誤りを修正し、言語の流暢さを保つために文法検査手段を介して文法チェックを行う。また、テキスト処理手段は、テキスト化されたデータの文法検査手段が語法の正確性を保証し、文書の自然な流れを維持するための調整を行う。
感情認識手段には、テキストマイニングと感情分析技術に基づく感情抽出手段が含まれ、生成されたテキストデータの言葉遣いや文脈からユーザの感情を推定する。また、感情抽出手段は、様々な感情を表す単語やフレーズ、文法的パターンを識別し、それらをポジティブ、ネガティブ、ニュートラルなどの感情カテゴリに分類する。また、感情強調手段は、抽出された感情に応じてテキストのトーンや言い回しを調整し、ユーザの感情状態をより適切に伝えるための修正を行う。さらに、感情抽出手段は、文書に表れる言語パターンからユーザの感情を読み取り、それを基に文書のトーンを調整する。
調整手段は、感情認識手段によって分析された感情データを基に、テキストの表現を変化させる。また、表現強化手段は、喜びや興奮などのポジティブな感情が検出された場合、使われる語彙をより明るく活気のあるものに置き換え、メッセージに好意的な印象を与える。さらに、表現緩和手段は、悲しみや怒りなどのネガティブな感情が検出された場合、メッセージのトーンを穏やかにし、共感と理解を示すような言い回しを選択する。また、ユーザの感情がポジティブな場合は、表現強化手段が文書に活力を与え、ネガティブな感情を示す場合は、表現緩和手段によって、より穏やかな表現を使用する。
メッセージの送信手段には、メッセンジャーアプリやメールクライアントとの連携機能が含まれ、調整されたテキストを適切な形式で送信する。また、送信プロトコル選定手段は、受信者のプラットフォームや設定に合わせて最適な送信プロトコルを選択し、メッセージの配信を保証する。また、ユーザインタフェース提示手段は、送信前に文書の最終レビューを行うためのプレビュー画面を提供し、ユーザが必要に応じて最終的な修正を加えることができるようにする。さらに、メッセージ送信手段がメッセンジャーアプリやメールクライアントに適したフォーマットで調整されたテキストを送信し、プレビュー画面がユーザが文書を最終確認するためのインターフェースを提供する。
以上のプロセスは、ユーザの使用するデバイスや設定に応じて、スマートデバイスの制御部やデータ処理装置に内蔵された特定処理部で実現される。また、これらの手段は、モジュール化されたコンポーネントとして設計され、システムの構成要素としての交換や拡張が容易に行われる。さらに、各手段の対応関係はフレキシブルに設定されており、システムのアップグレードやカスタマイズに対応するための多様な変更が可能である。また、この一連のプロセスは、デバイスや環境に応じて柔軟に対応できるようにモジュール化されており、システムのアップグレードやカスタマイズが容易に行えるように設計されている。
この形態例を更に拡張して、ユーザーの感情をより深く理解し、コミュニケーションの質を高める機能を追加することができる。例えば、感情エンジンにビデオチャット中の表情認識機能を組み込むことで、視覚的な情報からも感情を分析し、より正確な感情判断を行う。感情認識の精度を向上させるために、ユーザーの声のトーンやピッチの分析も行い、テキストに反映させることが可能である。さらに、ユーザーの過去のコミュニケーション履歴や反応パターンを分析することで、個人の感情表現スタイルを学習し、それに応じたよりパーソナライズされたテキスト調整を実現する。
テキスト処理手段は、表現の多様性と創造性を高めるために、文学作品や詩などからインスピレーションを得た言い回しを提案する機能を持つ。これにより、ユーザーの感情がより豊かに表現される。また、社会的コンテキストや文化的背景を考慮し、コミュニケーションが行われる環境や状況に合わせた適切な表現を選択することもできる。
メッセージ送信手段には、送信されるテキストが受け手の感情に与える影響を予測する機能を追加し、ユーザーがより責任を持ってコミュニケーションを取れるようにする。さらに、受信者の反応をAIが予測し、その情報をもとにユーザーが次に取るべきコミュニケーション戦略を提案することも可能である。
全体として、これらの機能は、ユーザーが感情を的確かつ敏感にコミュニケーションに反映させることをサポートし、より深い人間関係の構築に貢献する。また、これらの進化した手段は、個人だけでなく、企業のカスタマーサポートやCRMシステムにおいても、顧客との関係を深めるために有効活用できる。さらに、これらのシステムは、ユーザー教育やカウンセリングといった人間の感情が重要な役割を果たす分野での応用が期待される。
本システムは、感情エンジンを活用してユーザーの感情に応じた文書の表現調整を提供する。この機能を拡張するために、ユーザーの生体情報を取得するセンサーを統合し、心拍数や皮膚の導電率などの生理的反応から感情をより正確に読み取ることができる。センサーからのデータはリアルタイムで分析され、文書のトーンを即座に調整することが可能となる。
さらに、ユーザーの日常的なコミュニケーションを継続的に分析し、その人固有の表現スタイルや好みを把握する個性化学習機能を搭載する。この機能により、システムはユーザーの個性を反映したより自然で個別化された文書の提案が可能になる。
また、マルチリンガル対応を強化し、さまざまな言語での感情的ニュアンスを捉えることができるようにする。この機能により、国際的なコミュニケーションや多言語を話すユーザー間での理解を深めることができる。
ユーザーのプライバシー保護のために、感情データの匿名化や暗号化を行い、セキュリティを強化する機能も追加する。これにより、ユーザーは安心してシステムを利用できるようになる。
教育やカウンセリングの分野での応用を目指し、感情認識の結果を活用してコミュニケーションスキルのトレーニングをサポートする機能を開発する。トレーニングプログラムには、感情表現の練習や、適切なコミュニケーション手法の学習が含まれる。
最後に、システムのユーザビリティを向上させるために、ユーザーインターフェースをリッチかつ直感的なものにする。さまざまなジェスチャーや音声コマンドをサポートし、ユーザーが文書の調整プロセスに容易に介入できるようにする。これにより、ユーザーは自分の意志で表現を微調整し、より個人的なコミュニケーションを実現できる。
（形態例３）
本発明を実施するための形態は、通話中にユーザの感情を認識する感情エンジンを使用して、応答や対話の内容をユーザの感情に合わせて調整する調整手段とを具備するシステムである。具体的には、通話中にユーザの声のトーンや言葉の選択などを感情エンジンに入力し、ユーザの感情を推定する。推定された感情に基づいて、生成系AIの応答や対話の内容を調整する。例えば、ユーザが悲しい感情を示している場合には、共感の言葉を用いて励ましや支援を提供する。
調整手段は、通話中にユーザの感情を認識する感情エンジンを使用して、応答や対話の内容をユーザの感情に合わせて調整する手段として、データ処理装置１２の特定処理部２９０またはロボット４１４の制御部４６Ａによって実現される。通話中にユーザの声のトーンや言葉の選択を感情エンジンに入力し、ユーザの感情を推定する処理は、特定処理部２９０が実行する特定処理プログラム５６に従って行われる。各手段と装置や制御部との対応関係は、上述した例に限定されず、種々の変更が可能である。
音声データ収集手段は、高感度でノイズキャンセリング機能を備えたマイクロフォンが含まれる。このマイクロフォンは、周囲の雑音を除去し、ユーザの声を明瞭に録音するためのデジタル信号処理アルゴリズムを搭載している。また、音声データ収集手段は、ユーザの発話から微細な音響特性を把握する高感度マイクロフォンを採用し、これにデジタル信号処理技術を組み合わせて周囲の雑音を効果的に除去する。このマイクロフォンは、ユーザの声の特性を正確に捉え、感情の変化を検出するための基礎データを提供する。
音声特徴抽出手段は、音声信号から人間の感情を反映する可能性のある特徴量を抽出する。この抽出手段は、音響特徴分析を行うためのスペクトログラム解析機能やピッチ追跡機能、音響モデルを用いた感情識別機能が含まれる。また、音声特徴抽出手段は、スペクトル分析機能やピッチ解析機能といった音響解析ツールを用いて、音声信号から感情を示唆する特徴量を抽出し、これらのデータを感情推定モデルへと供給する。
感情分析手段には、機械学習に基づいた感情推定モデルが含まれ、音声特徴抽出手段によって抽出された特徴量を入力として、ユーザの感情状態を推定する。感情推定モデルは、トレーニングデータに基づいて訓練されたニューラルネットワーク、サポートベクターマシン、決定木などの分類器から構成され、ユーザの感情をポジティブ、ネガティブ、中立などのカテゴリに分類する。感情推定モデルは、継続的な学習を通じてその精度を向上させ、ユーザの感情に対する認識の微妙な変化にも対応できるように進化する。また、感情分析手段は、機械学習技術を活用した感情推定モデルを有し、抽出された音声特徴を元にユーザの感情を分析する。このモデルは、様々な機械学習アルゴリズムを組み合わせて、ユーザの発話から感情カテゴリーを識別し、これに基づいてユーザの感情状態を推定する。推定された感情状態は、ユーザの発話や振る舞いに対するシステムの反応を調整するための重要な情報となる。
対話調整手段には、生成系AIモデルを用いた応答生成機能が含まれ、感情分析手段によって推定された感情に応じて、対話の内容を動的に調整する。応答生成機能は、自然言語生成技術を駆使し、ユーザの感情に適した言葉選びやトーンを用いて応答文を生成する。例えば、ユーザが悲しい感情を示している場合、共感表現や慰めの言葉を含む応答が生成される。この応答生成機能は、会話の文脈を考慮し、ユーザの感情とコミュニケーションの目的に合致した内容を提供するためのコンテキストアウェア処理機能を備える。また、生成された応答は、ユーザにとって自然であり、感情的なニーズを満たすように構築される。応答生成機能は、大規模な会話データセットに基づいて訓練された機械学習モデルによって実現され、ユーザの言葉遣いや話し方に適応することができる。また、対話調整手段には、生成系AIモデルを用いた応答生成機能が含まれ、推定された感情に適切に応じた対話内容を生成する。この機能は、ユーザの感情に対応する言葉選びや対話トーンを選定し、ユーザの現在の感情や会話の文脈に合わせた反応を提供する。応答生成機能は、コンテキストアウェアな処理を行い、ユーザが必要とする情報やサポートを適切な形で提供するために設計されている。この機能は、大量の会話データから学習されたモデルを基に、ユーザの言葉遣いや情緒に適応した応答を生成することが可能である。
センサーを含まないデータ収集手段としては、ユーザがシステムに直接入力したテキスト情報や、通話記録から得られるメタデータが考えられる。これらは、ユーザの行動パターンや好みを分析する際の補足情報として利用され、感情エンジンの精度向上や応答生成機能の最適化に寄与する。ユーザが入力するテキスト情報は、感情分析手段の一部として、感情推定モデルの訓練データとしても活用される。
本システムは、通話中にユーザの感情をリアルタイムで分析し、対話内容を自動調整する能力を有しているため、さらに細かな感情の変化を捉えるために、音声データに加えて、表情認識技術を統合することができる。ウェブカメラやスマートデバイスのカメラを活用して、ユーザの顔の表情を分析し、感情分析の精度を向上させる。この追加された表情認識機能は、ユーザの感情をより正確に認識し、さらに微細な感情変化に応じた対話調整が可能となる。
また、ユーザの生理的シグナルを捉えるためのウェアラブルデバイスを組み込むことも考えられる。心拍数や皮膚電気活動など、生理的反応を測定することで、声のトーンや表情だけでは捉えきれない感情の深層を解析する。これらのデータを統合することで、感情分析の精度はさらに向上し、より適切な対話応答を生成することができる。
対話調整手段には、ユーザの文化的背景や個人的な価値観を考慮したカスタマイズ機能を追加することも有効である。ユーザプロファイルを構築し、その情報に基づいて、対話のトーンや内容をさらにパーソナライズする。これにより、ユーザ一人ひとりに合わせたきめ細やかなサポートを提供することが可能となる。
さらに、感情推定モデルの進化を促すために、クラウドソーシングによる感情データの収集や、多様なユーザからのフィードバックを取り入れて、モデルを継続的にアップデートする仕組みを構築する。これにより、多様な感情表現や言語に対応できる柔軟なシステムとなる。
また、教育やメンタルヘルスケアの分野における応用も検討することができる。例えば、教育分野では、学生の感情に適応した教材の提示やカウンセリングセッションでの使用が考えられる。メンタルヘルスケアでは、ユーザの感情を認識し、ストレスや不安を軽減するための対話支援を行う。これにより、ユーザが抱える問題に対してより効果的なアプローチが可能となる。
システムのプライバシー保護に関しても、ユーザの感情データを安全に保管し、適切なアクセス制御と暗号化技術を用いて、情報漏洩のリスクを最小限に抑えるためのセキュリティ対策を強化する。これにより、ユーザは安心してシステムを利用することができる。
最終的には、このシステムが提供するパーソナライズされた対話体験が、ユーザの生活の質を向上させるようなサービスへと発展することが期待される。
本発明の形態は、通話のみならず、ビデオ会議やオンライン教育のプラットフォームにも適用可能である。例えば、講師が生徒の感情をリアルタイムで把握し、カリキュラムの進行を感情に合わせて調整することで、より効果的な学習経験を提供する。ビデオ会議においても、参加者の感情を反映した対話管理が行われ、生産的かつポジティブな会議環境を促進する。
また、本システムには、感情反応に基づいた健康状態の監視機能を追加することも可能である。例えば、ユーザの声のトーンや話し方が一定期間にわたってネガティブな感情を示している場合、メンタルヘルスの専門家に通知を送り、必要に応じた介入を促すことができる。
さらに、感情エンジンの高度化に向け、ユーザの日常生活における感情パターンを分析し、その情報を元に長期的な感情管理やストレス軽減のアドバイスを提供する機能を組み込む。ユーザの生活リズムや活動パターンを分析し、感情の波を予測することで、適切なタイミングでリラクゼーションやモチベーション向上のためのコンテンツを提案する。
このシステムは、カスタマーサポートの分野でも応用が期待される。例えば、コールセンターのオペレーターが顧客の感情をリアルタイムで把握し、不満や怒りなどのネガティブな感情を検出した際には、即座に対応策を講じ、顧客満足度の向上に寄与する。
また、ゲームやエンターテインメントの分野でも、ユーザの感情に応じてコンテンツを動的に変化させることで、没入感や楽しさを増幅させる効果が期待される。ゲーム内のキャラクターがプレイヤーの感情に反応し、ストーリー展開や対話内容が変化することで、よりパーソナライズされた体験を実現する。
さらに、音声アシスタントや仮想現実（VR）との統合を図り、ユーザの感情に対してより自然な対話を実現する。音声アシスタントはユーザの感情を把握し、個々のニーズに合わせた情報やサービスを提供する。VR環境では、ユーザの感情に応じてシナリオや環境が変化し、リアルタイムで感情に合わせた体験を提供する。
本システムは、ユーザインタフェース（UI）やユーザーエクスペリエンス（UX）の設計においても革新をもたらす可能性を秘めている。感情認識技術を利用して、ユーザの感情に最適化されたUIやUXを提供し、利用者の満足度を高める。例えば、ウェブサイトやアプリケーションがユーザの感情をリアルタイムで把握し、コンテンツの提示方法やインタラクションの形式を調整する。
最終的には、このシステムが提供する感情調整機能が、人間関係の質を向上させ、コミュニケーションの効果を高めるツールとして社会に広く浸透していくことが期待される。 (Example 1)
The embodiment of the present invention is a system that includes a response means in which a generative AI responds when a call is received from an unregistered or unnotified phone number, and a transmission means in which the matter is documented in chat GPT after the call is ended and sent via a messenger app or email. Furthermore, as a combination means for combining an emotion engine that recognizes the user's emotions, the tone of the user's voice and choice of words are analyzed during the call to estimate the user's emotions. Based on the estimated emotions, the generative AI's response and the expression of the documented matter are adjusted. For example, if the user shows an anxious emotion, a calmer expression is used to provide a sense of security.
The response means is realized by the specific processing unit 290 of the data processing device 12, and the generative AI responds to an incoming call from an unregistered or unnotified phone number. The transmission means is realized by the specific processing unit 290 of the data processing device 12 as a means for documenting the matter in chat GPT after the call is ended and sending it by a messenger app or email. The emotion recognition means is realized by the specific processing unit 290 of the data processing device 12 or the control unit 46A of the robot 414 as a means for analyzing the tone of the user's voice and choice of words during the call and estimating the emotion. The correspondence between each means and the device or control unit is not limited to the above example, and various changes are possible.
The response means includes a dialogue generation means that combines natural language processing and speech synthesis. The dialogue generation means automatically responds to an unregistered or unnotified incoming call, enabling a dialogue with the user. When an unregistered or unnotified incoming call is detected, the generative AI responds using a pre-trained conversation model. This conversation model is trained based on various dialogue data so that it can handle various scenarios, and provides appropriate answers and guidance to the callee's questions and requests. In addition, the response means has an AI dialogue generation function, and can automatically provide an appropriate response to an incoming call from an unregistered or unnotified phone number. This function combines a dialogue management function for understanding the callee's purpose and request at the early stage of the call and providing a corresponding response, and a voice generation function for also converting the generated response into a natural voice. The dialogue management function can analyze the callee's intention based on the detection of specific keywords and phrases and generate an appropriate response. The voice generation function can convert a text-based response into voice in real time, providing the callee with a natural conversation experience.
The transmission means includes a chatbot and a documentation means using natural language understanding technology. After the call is ended, the contents of the call are accurately documented using an advanced natural language understanding model such as chat GPT. The documentation means has the ability to extract and summarize the main points of the call, and can generate a document that conveys the purpose concisely and clearly. The generated document is sent to the user via a messenger app or email. This process includes a link to the user's email address or messenger account, and the document is automatically sent in the appropriate format. The transmission means can also convert the contents of the call into text and provide it in an accessible form to the user. After the call is ended, the call content documentation function is activated and can extract and summarize the main points of the conversation. This summarized text is sent to the user's designated email address or messenger app through an automatic delivery function. This automatic delivery function includes an email delivery function and an app link function to properly format the document and ensure that it is delivered to the designated destination.
The emotion recognition means includes a voiceprint analysis means for performing voice analysis, and a language analysis means for estimating emotions from the choice of words. The voiceprint analysis means can detect characteristics of the user's voice, such as tone, pitch, and speed, using the microphone of the smart device, and can estimate the user's emotional state from these voice characteristics. The language analysis means can analyze the content of the user's speech and extract emotional context from the words and phrases used. These analysis results are used by the generative AI when making a response or adjusting the expression of documented matters. For example, if the user shows anxiety, the expression of the response or document is adjusted to be more gentle and reassuring. In addition, the emotion recognition means has a function of estimating emotions from the characteristics of the user's voice and language use during a call. The voiceprint analysis function can extract features such as the pitch, strength, and speed of the voice from the voice data collected by the microphone, and can estimate emotions by analyzing these characteristics. The language emotion analysis function can process the language data during a call, analyze the emotional meaning of the words and phrases used, and grasp the user's emotional state. These analytics can be used to adjust the tone of the AI's responses and the wording of documented calls to provide an appropriate emotional response to the user.
Data collection methods that do not involve sensors include text data entered by users themselves and feedback on system usage. These are provided to the system through the user input reception function and feedback collection function, and are used as a valuable source of information for service improvement.
These means are intended to respond quickly and effectively to user requests and improve the quality of communication. In addition, the implementation of each means is flexibly performed by the data processing device or the control unit of the smart device, and can be modified in various ways to improve the efficiency and usability of the system.
The system of the present invention can be equipped with an additional function of a summary prediction means for predicting the caller's intention before answering an unregistered or unnamed call. This allows the response means to generate a more accurate dialogue, realizing a meaningful exchange for the user. In addition, when the generative AI responds, it can have a language selection function based on the caller's country or region, enabling automatic response in multiple languages.
As for the delivery method, in addition to documenting the contents of the call, it can also highlight important keywords and phrases to help users quickly understand the document, and automatically generate action items based on the documented content and add them to the user's task list.
The emotion recognition means can detect changes in the user's emotions during a call in real time and dynamically adjust the tone and tempo of the conversation of the response means. Protocols can also be built in that, when a particular emotion is detected, prompt the user to contact an expert who can provide special support or advice according to the emotion.
The response tool can be enhanced with a personalization feature that analyzes the caller's past call history and related data to provide a more personalized response, making it possible to provide more relevant information to the user and increase the usefulness of the response.
Regarding sending methods, a function can be added that suggests follow-up actions based on the documented content of the call. For example, tasks and events included in the call content can be automatically registered in a calendar app to help users manage their time.
Furthermore, the emotion recognition means may be capable of detecting the user's stress level or tension and providing appropriate advice on how to reduce stress or links to relaxation content, thereby supporting the user's mental health and promoting overall well-being.
The system of this embodiment is considered to have a number of additional functions for further improving its functionality. For example, a voiceprint recognition means can be added to identify a caller by analyzing the caller's voiceprint and comparing it with previous call data for an unregistered or unnamed call. This allows the response means to respond more appropriately based on information if the caller has previously interacted with the system. The voiceprint recognition means also functions as a security measure, providing a reliable call experience for the user.
Regarding the means of transmission, when documenting the contents of the call, the function of structuring the contents of the call and classifying the text according to the importance of the information will be considered. This will allow users to grasp important information more quickly when reading the document. In addition, the documented content can be customized according to the user's preferences and past behavior patterns to provide a more personal experience.
The emotion recognition means detects the user's stress level during a call, and if it is estimated that the user is under high stress, it provides guidance on relaxation methods and psychological support related to the content of the call, thereby providing a service that allows users to relax and reduce stress through the call.
Additionally, the response tool will automatically suggest follow-up actions to users based on the content of the call, for example providing links to additional information about products or services suggested during the call or suggesting next steps of action to help users make decisions.
To improve response methods, we are considering a function that automatically changes the response style according to the caller's intentions. For example, if the caller indicates an emergency situation, we will adjust the AI to provide quick and accurate instructions. This will create a system that can immediately respond to the caller's needs.
These additional functions are intended to improve the user experience and enable accurate understanding of call content and rapid response. Each function is expected to maximize the capabilities of data processing devices and smart device control units, further enhancing the usability of the system.
(Example 2)
The embodiment of the present invention is a system that includes an adjustment means for documenting the matter in chat GPT after the call ends, and adjusting the expression of the document using an emotion engine that recognizes the user's emotions when sending it by messenger app or email. Specifically, the documented matter is input to the emotion engine, and the user's emotions are analyzed. Based on the analysis results, the expression of the document is appropriately adjusted. For example, if the user is feeling happy or excited, the user's emotions are shared by using brighter and more lively expressions.
The adjustment means is realized by the specific processing unit 290 of the data processing device 12 as a means for documenting the matter in chat GPT after the call ends and adjusting the expression of the document using an emotion engine that recognizes the user's emotion when sending it by messenger app or email. The process of inputting the documented matter to the emotion engine and analyzing the user's emotion is performed according to the specific processing program 56 executed by the specific processing unit 290. The correspondence between each means and the device or control unit is not limited to the above example, and various changes are possible.
The conversation content extraction means converts the contents of the call into text data by making full use of voice recognition technology. After receiving the voice signal, the conversation content extraction means uses noise reduction means to remove background noise and improve the accuracy of the voice-to-text conversion. Furthermore, the conversation content extraction means uses cutting-edge voice recognition technology to accurately convert the contents of the call into text data, and in the process, the noise reduction means removes background noise and improves the conversion accuracy.
The text processing means corrects syntactic errors in the converted text data and performs grammar checking via the grammar checking means to maintain linguistic fluency. The text processing means also performs adjustments to ensure that the grammar checking means of the converted text data ensures accuracy of grammar and maintains the natural flow of the document.
The emotion recognition means includes an emotion extraction means based on text mining and emotion analysis techniques, which estimates the user's emotion from the wording and context of the generated text data. The emotion extraction means also identifies words, phrases, and grammatical patterns that express various emotions and classifies them into emotion categories such as positive, negative, and neutral. The emotion emphasis means also adjusts the tone and phrasing of the text according to the extracted emotions, making modifications to better convey the user's emotional state. The emotion extraction means also reads the user's emotion from the language patterns expressed in the document and adjusts the tone of the document based on that.
The adjustment means changes the expression of the text based on the emotion data analyzed by the emotion recognition means. When a positive emotion such as joy or excitement is detected, the expression enhancement means replaces the vocabulary used with brighter and more lively words to give the message a favorable impression. When a negative emotion such as sadness or anger is detected, the expression mitigation means softens the tone of the message and selects phrases that show empathy and understanding. When the user's emotion is positive, the expression enhancement means energizes the document, and when the emotion indicates a negative emotion, the expression mitigation means uses a more gentle expression.
The message sending means includes a function for linking with a messenger application or an email client and sends the adjusted text in an appropriate format. Furthermore, the transmission protocol selection means selects an optimal transmission protocol according to the platform and settings of the recipient and ensures delivery of the message. Furthermore, the user interface presentation means provides a preview screen for a final review of the document before sending, allowing the user to make final corrections as necessary. Furthermore, the message sending means sends the adjusted text in a format suitable for the messenger application or the email client, and the preview screen provides an interface for the user to make a final check of the document.
The above processes are realized by a specific processing unit built into the control unit of the smart device or the data processing device, depending on the device and settings used by the user. Moreover, these means are designed as modularized components, and can be easily replaced or expanded as system components. Furthermore, the correspondence between each means is set flexibly, and various changes can be made to accommodate system upgrades and customization. Moreover, this series of processes is modularized so that it can flexibly respond to devices and environments, and is designed to facilitate system upgrades and customization.
This embodiment can be further expanded to add a function for deeper understanding of the user's emotions and improving the quality of communication. For example, by incorporating a facial expression recognition function during video chat into the emotion engine, emotions can be analyzed from visual information as well, resulting in more accurate emotion judgment. To improve the accuracy of emotion recognition, the tone and pitch of the user's voice can also be analyzed and reflected in the text. Furthermore, by analyzing the user's past communication history and response patterns, the system can learn the individual's emotional expression style and adjust the text accordingly in a more personalized manner.
The text processing means has a function to suggest phrases inspired by literary works and poetry to enhance the diversity and creativity of expressions, allowing users to express their emotions more richly. It can also select appropriate expressions according to the environment and situation in which the communication takes place, taking into account the social context and cultural background.
The messaging tool will add a function to predict the emotional impact of the text sent on the recipient, allowing users to communicate more responsibly. In addition, AI can predict the recipient's reaction and use that information to suggest the next communication strategy the user should take.
Overall, these features will help users reflect their emotions more accurately and sensitively in their communications, contributing to building deeper human relationships. These advanced methods can be effectively used by individuals as well as corporate customer support and CRM systems to deepen customer relationships. Furthermore, these systems are expected to be applied in areas where human emotions play an important role, such as user education and counseling.
The system leverages an emotion engine to provide document expression adjustments based on the user's emotions. To extend this functionality, sensors that capture the user's biometric information can be integrated to more accurately read emotions from physiological responses such as heart rate and skin conductivity. Data from the sensors is analyzed in real time, allowing the tone of the document to be adjusted instantly.
In addition, it is equipped with a personalized learning function that continuously analyzes the user's daily communication to understand the user's unique expression style and preferences, allowing the system to suggest more natural and personalized documents that reflect the user's personality.
It will also enhance multilingual support, allowing it to capture emotional nuances in different languages, enhancing international communication and understanding among multilingual users.
To protect user privacy, the company will also add features to anonymize and encrypt emotion data and strengthen security, allowing users to use the system with peace of mind.
Aiming for applications in the fields of education and counseling, we will develop a function that uses the results of emotion recognition to support communication skills training. The training program will include practicing emotional expression and learning appropriate communication techniques.
Finally, to improve the usability of the system, we make the user interface rich and intuitive, supporting a variety of gestures and voice commands, and allowing users to easily intervene in the document adjustment process, allowing users to fine-tune the expression at their own will and achieve more personal communication.
(Example 3)
The embodiment of the present invention is a system that uses an emotion engine that recognizes the user's emotions during a call and includes an adjustment means for adjusting the response and dialogue content to match the user's emotions. Specifically, the user's tone of voice and choice of words are input to the emotion engine during a call to estimate the user's emotions. The response and dialogue content of the generative AI are adjusted based on the estimated emotions. For example, if the user shows sad emotions, words of empathy are used to provide encouragement and support.
The adjustment means is realized by the specific processing unit 290 of the data processing device 12 or the control unit 46A of the robot 414 as a means for adjusting the response or content of the dialogue to match the user's emotions using an emotion engine that recognizes the user's emotions during a call. The process of inputting the user's tone of voice and choice of words during a call to the emotion engine and estimating the user's emotions is performed according to the specific processing program 56 executed by the specific processing unit 290. The correspondence between each means and the device or control unit is not limited to the above example, and various modifications are possible.
The voice data collection means includes a microphone with high sensitivity and noise canceling function. The microphone is equipped with a digital signal processing algorithm to remove ambient noise and clearly record the user's voice. The voice data collection means also employs a highly sensitive microphone that captures subtle acoustic characteristics from the user's speech and combines this with digital signal processing technology to effectively remove ambient noise. The microphone accurately captures the characteristics of the user's voice and provides basic data for detecting changes in emotions.
The voice feature extraction means extracts features that may reflect human emotions from the voice signal. This extraction means includes a spectrogram analysis function and a pitch tracking function for performing acoustic feature analysis, and an emotion identification function using an acoustic model. The voice feature extraction means also uses acoustic analysis tools such as a spectrum analysis function and a pitch analysis function to extract features that suggest emotions from the voice signal, and supplies these data to the emotion estimation model.
The emotion analysis means includes an emotion estimation model based on machine learning, which estimates the user's emotional state using the features extracted by the voice feature extraction means as input. The emotion estimation model is composed of classifiers such as neural networks, support vector machines, and decision trees trained based on training data, and classifies the user's emotions into categories such as positive, negative, and neutral. The emotion estimation model improves its accuracy through continuous learning, and evolves to be able to respond to subtle changes in the user's perception of emotions. In addition, the emotion analysis means has an emotion estimation model that utilizes machine learning technology, and analyzes the user's emotions based on the extracted voice features. This model combines various machine learning algorithms to identify emotion categories from the user's utterance, and estimates the user's emotional state based on this. The estimated emotional state is important information for adjusting the system's response to the user's utterance and behavior.
The dialogue adjustment means includes a response generation function using a generative AI model, which dynamically adjusts the content of the dialogue according to the emotion estimated by the emotion analysis means. The response generation function utilizes natural language generation technology to generate a response sentence using words and tones appropriate for the user's emotion. For example, if the user is expressing sad emotion, a response including empathetic expressions and comforting words is generated. This response generation function has a context-aware processing function for taking into account the context of the conversation and providing content that matches the user's emotion and the purpose of the communication. In addition, the generated response is constructed to be natural to the user and to meet the emotional needs. The response generation function is realized by a machine learning model trained on a large-scale conversation dataset and can adapt to the user's language and speaking style. In addition, the dialogue adjustment means includes a response generation function using a generative AI model, which generates dialogue content appropriately corresponding to the estimated emotion. This function selects words and a dialogue tone that correspond to the user's emotion and provides a response that matches the user's current emotion and the context of the conversation. The response generation function is designed to perform context-aware processing and provide the information and support required by the user in an appropriate form. This feature is capable of generating responses that adapt to a user's language and emotions based on models trained from large amounts of conversational data.
Non-sensor-based data collection methods include text information entered directly by users into the system and metadata obtained from call records. These are used as supplementary information when analyzing user behavioral patterns and preferences, and contribute to improving the accuracy of the emotion engine and optimizing response generation functions. User-entered text information is also used as training data for emotion estimation models as part of the emotion analysis method.
The system has the ability to analyze the user's emotions in real time during a call and automatically adjust the dialogue content, so it can integrate facial expression recognition technology in addition to voice data to capture even subtler changes in emotions. It uses a webcam or a camera on a smart device to analyze the user's facial expressions and improve the accuracy of emotion analysis. This added facial expression recognition function will more accurately recognize the user's emotions and make it possible to adjust the dialogue according to even subtler emotional changes.
It is also possible to incorporate wearable devices to capture the user's physiological signals. Measuring physiological responses such as heart rate and electrodermal activity can provide insight into deeper emotions that cannot be captured by tone of voice or facial expressions alone. Integrating this data can further improve the accuracy of emotion analysis and generate more appropriate dialogue responses.
It would also be effective to add customization features to the dialogue adjustment method that take into account the user's cultural background and personal values. A user profile can be constructed, and the tone and content of the dialogue can be further personalized based on that information. This makes it possible to provide detailed support tailored to each individual user.
Furthermore, in order to promote the evolution of the emotion estimation model, we will build a mechanism to collect emotion data through crowdsourcing and incorporate feedback from a variety of users to continuously update the model, resulting in a flexible system that can handle a variety of emotional expressions and languages.
Applications in the fields of education and mental health care can also be considered. For example, in the field of education, it could be used to present educational materials adapted to students' emotions or in counseling sessions. In mental health care, it could recognize the user's emotions and provide dialogue support to reduce stress and anxiety. This would enable a more effective approach to the problems the user is facing.
Regarding privacy protection for the system, we will strengthen security measures to safely store users' emotional data and minimize the risk of information leakage by using appropriate access control and encryption technology. This will allow users to use the system with peace of mind.
Ultimately, it is hoped that the personalized interaction experience provided by this system will be developed into services that improve the quality of users' lives.
The embodiment of the present invention is applicable not only to telephone calls but also to video conferencing and online education platforms. For example, a lecturer can grasp the emotions of students in real time and adjust the progress of the curriculum according to the emotions, providing a more effective learning experience. In video conferencing, dialogue management reflecting the emotions of participants is also performed, promoting a productive and positive meeting environment.
The system could also be enhanced to monitor health conditions based on emotional responses: for example, if a user's tone of voice or manner of speaking indicates negative emotions over a period of time, a mental health professional could be notified to intervene if necessary.
Furthermore, to further improve the emotion engine, a function will be incorporated that will analyze the emotional patterns of the user's daily life and provide advice on long-term emotion management and stress reduction based on that information. By analyzing the user's daily rhythm and activity patterns and predicting emotional ups and downs, the system will suggest content for relaxation and motivation at appropriate times.
This system is also expected to be applied in the field of customer support. For example, if call center operators could grasp the customer's emotions in real time and detect negative emotions such as dissatisfaction or anger, they could take immediate action to address the issue, contributing to improving customer satisfaction.
In the fields of games and entertainment, dynamic changes to content based on the user's emotions are expected to have the effect of increasing immersion and enjoyment. In-game characters will react to the player's emotions, changing the story development and dialogue, creating a more personalized experience.
Furthermore, we will integrate voice assistants and virtual reality (VR) to realize more natural dialogue based on the user's emotions. Voice assistants will understand the user's emotions and provide information and services tailored to individual needs. In the VR environment, the scenario and environment will change according to the user's emotions, providing an experience that matches the emotions in real time.
This system also has the potential to revolutionize the design of user interfaces (UI) and user experiences (UX). Using emotion recognition technology, it can provide UI and UX optimized for the user's emotions, increasing user satisfaction. For example, websites and applications can grasp the user's emotions in real time and adjust the way content is presented and the form of interaction.
Ultimately, it is hoped that the emotion regulation function provided by this system will become widely used throughout society as a tool to improve the quality of human relationships and increase the effectiveness of communication.

（形態例１）
ステップ１：未登録または非通知の電話番号からの着信があった場合、生成系AIが応答する。
ステップ２：通話終了後、用件をチャットGPTで文書化する。
ステップ３：文書化された用件を感情エンジンに入力し、ユーザの感情を分析する。
ステップ４：分析結果に基づいて、文書の表現を調整する。
ステップ５：調整された文書をメッセンジャーアプリやメールで送信する。
（形態例２）
ステップ１：通話終了後、用件をチャットGPTで文書化する。
ステップ２：文書化された用件を感情エンジンに入力し、ユーザの感情を分析する。
ステップ３：分析結果に基づいて、文書の表現を調整する。
ステップ４：調整された文書を選択し、送信先を指定する（例：家族）。
ステップ５：指定された送信先に文書をメッセンジャーアプリやメールで送信する。
（形態例３）
ステップ１：通話中にユーザの声のトーンや言葉の選択などを感情エンジンに入力し、ユーザの感情を分析する。
ステップ２：分析結果に基づいて、生成系AIの応答や対話の内容を調整する。
ステップ３：通話終了後、用件をチャットGPTで文書化する。文書化された用件は、感情エンジンに入力するために使用される。 (Example 1)
Step 1: When a call comes in from an unregistered or withheld phone number, the generative AI answers.
Step 2: After the call, document the matters you wish to discuss in Chat GPT.
Step 3: Input the documented requirements into the emotion engine to analyze the user's emotions.
Step 4: Based on the analysis results, the representation of the document is adjusted.
Step 5: Send the adjusted document via messenger app or email.
(Example 2)
Step 1: After the call, document the matters you wish to discuss in Chat GPT.
Step 2: Input the documented requirements into the emotion engine to analyze the user's emotions.
Step 3: Adjust the representation of the document based on the analysis results.
Step 4: Select the adjusted document and specify the recipient (e.g., family).
Step 5: Send the document to the specified recipient via messenger app or email.
(Example 3)
Step 1: During a call, the user's tone of voice, choice of words, etc. are input into the emotion engine to analyze the user's emotions.
Step 2: Based on the analysis results, adjust the generative AI's responses and dialogue content.
Step 3: After the call, document the matter in the chat GPT. The documented matter is used to input the emotion engine.

特定処理部２９０は、特定処理の結果をロボット４１４に送信する。ロボット４１４では、制御部４６Ａが、スピーカ２４０及び制御対象４４３に対して特定処理の結果を出力させる。マイクロフォン２３８は、特定処理の結果に対するユーザ入力を示す音声を取得する。制御部４６Ａは、マイクロフォン２３８によって取得されたユーザ入力を示す音声データをデータ処理装置１２に送信する。データ処理装置１２では、特定処理部２９０が音声データを取得する。 The specific processing unit 290 transmits the result of the specific processing to the robot 414. In the robot 414, the control unit 46A causes the speaker 240 and the control target 443 to output the result of the specific processing. The microphone 238 acquires voice indicating the user input for the result of the specific processing. The control unit 46A transmits voice data indicating the user input acquired by the microphone 238 to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the voice data.

上記実施形態では、データ処理装置１２によって特定処理が行われる形態例を挙げたが、本開示の技術はこれに限定されず、ロボット４１４によって特定処理が行われるようにしてもよい。 In the above embodiment, an example was given in which the specific processing is performed by the data processing device 12, but the technology disclosed herein is not limited to this, and the specific processing may also be performed by the robot 414.

なお、感情エンジンとしての感情特定モデル５９は、特定のマッピングに従い、ユーザの感情を決定してよい。具体的には、感情特定モデル５９は、特定のマッピングである感情マップ（図９参照）に従い、ユーザの感情を決定してよい。また、感情特定モデル５９は、同様に、ロボットの感情を決定し、特定処理部２９０は、ロボットの感情を用いた特定処理を行うようにしてもよい。 The emotion identification model 59, which serves as an emotion engine, may determine the emotion of the user according to a specific mapping. Specifically, the emotion identification model 59 may determine the emotion of the user according to an emotion map (see FIG. 9), which is a specific mapping. Similarly, the emotion identification model 59 may determine the emotion of the robot, and the identification processing unit 290 may perform identification processing using the emotion of the robot.

図９は、複数の感情がマッピングされる感情マップ４００を示す図である。感情マップ４００において、感情は、中心から放射状に同心円に配置されている。同心円の中心に近いほど、原始的状態の感情が配置されている。同心円のより外側には、心境から生まれる状態や行動を表す感情が配置されている。感情とは、情動や心的状態も含む概念である。同心円の左側には、概して脳内で起きる反応から生成される感情が配置されている。同心円の右側には概して、状況判断で誘導される感情が配置されている。同心円の上方向及び下方向には、概して脳内で起きる反応から生成され、かつ、状況判断で誘導される感情が配置されている。また、同心円の上側には、「快」の感情が配置され、下側には、「不快」の感情が配置されている。このように、感情マップ４００では、感情が生まれる構造に基づいて複数の感情がマッピングされており、同時に生じやすい感情が、近くにマッピングされている。 9 is a diagram showing an emotion map 400 on which multiple emotions are mapped. In the emotion map 400, emotions are arranged in concentric circles radiating from the center. The closer to the center of the concentric circles, the more primitive emotions are arranged. Emotions that represent states and actions arising from a state of mind are arranged on the outer sides of the concentric circles. Emotions are a concept that includes emotions and mental states. On the left side of the concentric circles, emotions that are generally generated from reactions that occur in the brain are arranged. On the right side of the concentric circles, emotions that are generally induced by situational judgment are arranged. On the upper and lower sides of the concentric circles, emotions that are generally generated from reactions that occur in the brain and are induced by situational judgment are arranged. In addition, the emotion of "pleasure" is arranged on the upper side of the concentric circles, and the emotion of "discomfort" is arranged on the lower side. In this way, in the emotion map 400, multiple emotions are mapped based on the structure in which emotions are generated, and emotions that tend to occur simultaneously are mapped close to each other.

これらの感情は、感情マップ４００の３時の方向に分布しており、普段は安心と不安のあたりを行き来する。感情マップ４００の右半分では、内部的な感覚よりも状況認識の方が優位に立つため、落ち着いた印象になる。 These emotions are distributed in the three o'clock direction of emotion map 400, and usually fluctuate between relief and anxiety. In the right half of emotion map 400, situational awareness takes precedence over internal sensations, resulting in a sense of calm.

感情マップ４００の内側は心の中、感情マップ４００の外側は行動を表すため、感情マップ４００の外側に行くほど、感情が目に見える（行動に表れる）ようになる。 The inside of emotion map 400 represents what is going on inside the mind, and the outside of emotion map 400 represents behavior, so the further out you go on emotion map 400, the more visible (expressed in behavior) the emotions become.

ここで、人の感情は、姿勢や血糖値のような様々なバランスを基礎としており、それらのバランスが理想から遠ざかると不快、理想に近づくと快という状態を示す。ロボットや自動車やバイク等においても、姿勢やバッテリー残量のような様々なバランスを基礎として、それらのバランスが理想から遠ざかると不快、理想に近づくと快という状態を示すように感情を作ることができる。感情マップは、例えば、光吉博士の感情地図（音声感情認識及び情動の脳生理信号分析システムに関する研究、徳島大学、博士論文：https://ci.nii.ac.jp/naid/500000375379）に基づいて生成されてよい。感情地図の左半分には、感覚が優位にたつ「反応」と呼ばれる領域に属する感情が並ぶ。また、感情地図の右半分には、状況認識が優位にたつ「状況」と呼ばれる領域に属する感情が並ぶ。 Here, human emotions are based on various balances such as posture and blood sugar level, and when these balances are far from the ideal, it indicates an unpleasant state, and when they are close to the ideal, it indicates a pleasant state. Emotions can also be created for robots, cars, motorcycles, etc., based on various balances such as posture and remaining battery power, so that when these balances are far from the ideal, it indicates an unpleasant state, and when they are close to the ideal, it indicates a pleasant state. The emotion map may be generated, for example, based on the emotion map of Dr. Mitsuyoshi (Research on speech emotion recognition and emotion brain physiological signal analysis system, Tokushima University, doctoral dissertation: https://ci.nii.ac.jp/naid/500000375379). The left half of the emotion map is lined with emotions that belong to an area called "reaction" where sensation is dominant. The right half of the emotion map is lined with emotions that belong to an area called "situation" where situation recognition is dominant.

感情マップでは学習を促す感情が２つ定義される。１つは、状況側にあるネガティブな「懺悔」や「反省」の真ん中周辺の感情である。つまり、「もう２度とこんな想いはしたくない」「もう叱られたくない」というネガティブな感情がロボットに生じたときである。もう１つは、反応側にあるポジティブな「欲」のあたりの感情である。つまり、「もっと欲しい」「もっと知りたい」というポジティブな気持ちのときである。 The emotion map defines two emotions that encourage learning. The first is the negative emotion around the middle of "repentance" or "reflection" on the situation side. In other words, this is when the robot experiences negative emotions such as "I never want to feel this way again" or "I don't want to be scolded again." The other is the positive emotion around "desire" on the response side. In other words, this is when the robot has positive feelings such as "I want more" or "I want to know more."

感情特定モデル５９は、ユーザ入力を、予め学習されたニューラルネットワークに入力し、感情マップ４００に示す各感情を示す感情値を取得し、ユーザの感情を決定する。このニューラルネットワークは、ユーザ入力と、感情マップ４００に示す各感情を示す感情値との組み合わせである複数の学習データに基づいて予め学習されたものである。また、このニューラルネットワークは、図１０に示す感情マップ９００のように、近くに配置されている感情同士は、近い値を持つように学習される。図１０では、「安心」、「安穏」、「心強い」という複数の感情が、近い感情値となる例を示している。 The emotion identification model 59 inputs user input to a pre-trained neural network, obtains emotion values indicating each emotion shown in the emotion map 400, and determines the user's emotion. This neural network is pre-trained based on multiple learning data that are combinations of user input and emotion values indicating each emotion shown in the emotion map 400. Furthermore, this neural network is trained so that emotions that are located close to each other have similar values, as in the emotion map 900 shown in Figure 10. Figure 10 shows an example in which multiple emotions, "peace of mind," "calm," and "reassuring," have similar emotion values.

上記実施形態では、１台のコンピュータ２２によって特定処理が行われる形態例を挙げたが、本開示の技術はこれに限定されず、コンピュータ２２を含めた複数のコンピュータによる特定処理に対する分散処理が行われるようにしてもよい。 In the above embodiment, an example was given in which a specific process is performed by one computer 22, but the technology disclosed herein is not limited to this, and distributed processing of the specific process may be performed by multiple computers, including computer 22.

上記実施形態では、ストレージ３２に特定処理プログラム５６が格納されている形態例を挙げて説明したが、本開示の技術はこれに限定されない。例えば、特定処理プログラム５６がＵＳＢ（Universal Serial Bus）メモリなどの可搬型のコンピュータ読み取り可能な非一時的格納媒体に格納されていてもよい。非一時的格納媒体に格納されている特定処理プログラム５６は、データ処理装置１２のコンピュータ２２にインストールされる。プロセッサ２８は、特定処理プログラム５６に従って特定処理を実行する。 In the above embodiment, an example has been described in which the specific processing program 56 is stored in the storage 32, but the technology of the present disclosure is not limited to this. For example, the specific processing program 56 may be stored in a portable, computer-readable, non-transitory storage medium such as a Universal Serial Bus (USB) memory. The specific processing program 56 stored in the non-transitory storage medium is installed in the computer 22 of the data processing device 12. The processor 28 executes the specific processing in accordance with the specific processing program 56.

また、ネットワーク５４を介してデータ処理装置１２に接続されるサーバ等の格納装置に特定処理プログラム５６を格納させておき、データ処理装置１２の要求に応じて特定処理プログラム５６がダウンロードされ、コンピュータ２２にインストールされるようにしてもよい。 The specific processing program 56 may also be stored in a storage device such as a server connected to the data processing device 12 via the network 54, and the specific processing program 56 may be downloaded and installed in the computer 22 in response to a request from the data processing device 12.

なお、ネットワーク５４を介してデータ処理装置１２に接続されるサーバ等の格納装置に特定処理プログラム５６の全てを格納させておいたり、ストレージ３２に特定処理プログラム５６の全てを記憶させたりしておく必要はなく、特定処理プログラム５６の一部を格納させておいてもよい。 It is not necessary to store all of the specific processing program 56 in a storage device such as a server connected to the data processing device 12 via the network 54, or to store all of the specific processing program 56 in the storage 32; only a portion of the specific processing program 56 may be stored.

特定処理を実行するハードウェア資源としては、次に示す各種のプロセッサを用いることができる。プロセッサとしては、例えば、ソフトウェア、すなわち、プログラムを実行することで、特定処理を実行するハードウェア資源として機能する汎用的なプロセッサであるＣＰＵが挙げられる。また、プロセッサとしては、例えば、ＦＰＧＡ（Field-Programmable Gate Array）、ＰＬＤ（Programmable Logic Device）、又はＡＳＩＣ（Application Specific Integrated Circuit）などの特定の処理を実行させるために専用に設計された回路構成を有するプロセッサである専用電気回路が挙げられる。何れのプロセッサにもメモリが内蔵又は接続されており、何れのプロセッサもメモリを使用することで特定処理を実行する。 The various processors listed below can be used as hardware resources for executing specific processes. Examples of processors include a CPU, which is a general-purpose processor that functions as a hardware resource for executing specific processes by executing software, i.e., a program. Examples of processors include dedicated electrical circuits, such as FPGAs (Field-Programmable Gate Arrays), PLDs (Programmable Logic Devices), or ASICs (Application Specific Integrated Circuits), which are processors with a circuit configuration designed specifically to execute specific processes. All of these processors have built-in or connected memory, and all of these processors execute specific processes by using the memory.

特定処理を実行するハードウェア資源は、これらの各種のプロセッサのうちの１つで構成されてもよいし、同種又は異種の２つ以上のプロセッサの組み合わせ（例えば、複数のＦＰＧＡの組み合わせ、又はＣＰＵとＦＰＧＡとの組み合わせ）で構成されてもよい。また、特定処理を実行するハードウェア資源は１つのプロセッサであってもよい。 The hardware resource that executes the specific process may be composed of one of these various processors, or may be composed of a combination of two or more processors of the same or different types (e.g., a combination of multiple FPGAs, or a combination of a CPU and an FPGA). The hardware resource that executes the specific process may also be a single processor.

１つのプロセッサで構成する例としては、第１に、１つ以上のＣＰＵとソフトウェアの組み合わせで１つのプロセッサを構成し、このプロセッサが、特定処理を実行するハードウェア資源として機能する形態がある。第２に、ＳｏＣ（System-on-a-chip）などに代表されるように、特定処理を実行する複数のハードウェア資源を含むシステム全体の機能を１つのＩＣチップで実現するプロセッサを使用する形態がある。このように、特定処理は、ハードウェア資源として、上記各種のプロセッサの１つ以上を用いて実現される。 As an example of a configuration using a single processor, first, there is a configuration in which one processor is configured by combining one or more CPUs with software, and this processor functions as a hardware resource that executes a specific process. Secondly, there is a configuration in which a processor is used that realizes the functions of the entire system, including multiple hardware resources that execute a specific process, on a single IC chip, as typified by SoC (System-on-a-chip). In this way, a specific process is realized using one or more of the various processors mentioned above as hardware resources.

更に、これらの各種のプロセッサのハードウェア的な構造としては、より具体的には、半導体素子などの回路素子を組み合わせた電気回路を用いることができる。また、上記の特定処理はあくまでも一例である。従って、主旨を逸脱しない範囲内において不要なステップを削除したり、新たなステップを追加したり、処理順序を入れ替えたりしてもよいことは言うまでもない。 More specifically, the hardware structure of these various processors can be an electric circuit that combines circuit elements such as semiconductor elements. The specific processing described above is merely an example. It goes without saying that unnecessary steps can be deleted, new steps can be added, and the processing order can be changed without departing from the spirit of the invention.

以上に示した記載内容及び図示内容は、本開示の技術に係る部分についての詳細な説明であり、本開示の技術の一例に過ぎない。例えば、上記の構成、機能、作用、及び効果に関する説明は、本開示の技術に係る部分の構成、機能、作用、及び効果の一例に関する説明である。よって、本開示の技術の主旨を逸脱しない範囲内において、以上に示した記載内容及び図示内容に対して、不要な部分を削除したり、新たな要素を追加したり、置き換えたりしてもよいことは言うまでもない。また、錯綜を回避し、本開示の技術に係る部分の理解を容易にするために、以上に示した記載内容及び図示内容では、本開示の技術の実施を可能にする上で特に説明を要しない技術常識等に関する説明は省略されている。 The above description and illustrations are a detailed explanation of the parts related to the technology of the present disclosure, and are merely an example of the technology of the present disclosure. For example, the above explanation of the configuration, functions, actions, and effects is an explanation of an example of the configuration, functions, actions, and effects of the parts related to the technology of the present disclosure. Therefore, it goes without saying that unnecessary parts may be deleted, new elements may be added, or replacements may be made to the above description and illustrations, within the scope of the gist of the technology of the present disclosure. Also, in order to avoid confusion and to facilitate understanding of the parts related to the technology of the present disclosure, the above description and illustrations omit explanations of technical common knowledge that do not require particular explanation to enable the implementation of the technology of the present disclosure.

本明細書に記載された全ての文献、特許出願及び技術規格は、個々の文献、特許出願及び技術規格が参照により取り込まれることが具体的かつ個々に記された場合と同程度に、本明細書中に参照により取り込まれる。 All publications, patent applications, and technical standards described in this specification are incorporated by reference into this specification to the same extent as if each individual publication, patent application, and technical standard was specifically and individually indicated to be incorporated by reference.

以上の実施形態に関し、更に以下を開示する。 The following is further disclosed regarding the above embodiment.

（付記１）
未登録または非通知の電話番号からの着信時に、生成系AIが最初の応答を行い（生成系AIの回答を音声に変換して応答）、通話終了後に用件をチャットGPTで文書化し、メッセンジャーアプリやメールで送信するシステムであって、
家族や知人と名乗る場合には、専用の質問や合言葉を使用して確認する手段と、
配送業者は、業者用の合言葉を使用して本人につながる手段と、
通話中には、電話番号をチェックし、悪用歴のある番号の場合は警察に連携する手段とを含むシステム。
（付記２）
付記１に記載のシステムであって、通話終了後に用件をチャットGPTで文書化し、メッセンジャーアプリやメールで送信する際に、家族宛に送信可能な手段を具備することを特徴とするシステム。
（付記３）
付記１に記載のシステムであって、通話中に電話番号をチェックし、悪用歴のある番号の場合には警察に連携する際に、警察との連携手段を具備することを特徴とするシステム。 (Appendix 1)
When a call is received from an unregistered or withheld phone number, the generative AI responds first (by converting the generative AI's response into voice and responding), and after the call ends, the purpose of the call is documented in Chat GPT and sent via a messenger app or email.
If the person claims to be a family member or acquaintance, you can ask them special questions or use a password to verify their identity.
The delivery company will provide a way to contact the recipient using a secret code provided by the company.
The system checks phone numbers during calls and includes a means to contact the police if the number has a history of misuse.
(Appendix 2)
A system as described in Appendix 1, characterized in that it has a means for documenting the matter in chat GPT after the call ends and sending it to family members via a messenger app or email.
(Appendix 3)
A system as described in Appendix 1, characterized in that the system checks a telephone number during a call and, if the number has a history of misuse, contacts the police, by including a means for contacting the police.

（付記１）
未登録または非通知の電話番号からの着信時に、生成系AIが応答する手段と、通話終了後に用件をチャットGPTで文書化し、メッセンジャーアプリやメールで送信する手段とを具備するシステムであって、さらに、ユーザの感情を認識する感情エンジンを組み合わせる手段とを含むシステム。
（付記２）
付記１に記載のシステムであって、通話終了後に用件をチャットGPTで文書化し、メッセンジャーアプリやメールで送信する際に、ユーザの感情を認識する感情エンジンを使用して文書の表現を調整する手段とを具備するシステム。
（付記３）
付記１に記載のシステムであって、通話中にユーザの感情を認識する感情エンジンを使用して、応答や対話の内容をユーザの感情に合わせて調整する手段とを具備するシステム。 (Appendix 1)
A system comprising a means for a generative AI to respond to an incoming call from an unregistered or withheld phone number, and a means for documenting the matter in chat GPT after the call ends and sending it via a messenger app or email, and further comprising a means for combining with an emotion engine that recognizes the user's emotions.
(Appendix 2)
The system according to claim 1, further comprising a means for documenting the contents of the call in chat GPT after the call ends, and adjusting the expression of the document using an emotion engine that recognizes the user's emotions when sending the document via a messenger app or email.
(Appendix 3)
2. The system according to claim 1, further comprising: means for adjusting responses and dialogue content to match the user's emotions using an emotion engine that recognizes the user's emotions during a call.

１０、２１０、３１０、４１０データ処理システム
１２データ処理装置
１４スマートデバイス
２１４スマート眼鏡
３１４ヘッドセット型端末
４１４ロボット 10, 210, 310, 410 Data processing system 12 Data processing device 14 Smart device 214 Smart glasses 314 Headset type terminal 414 Robot

Claims

1. A system comprising:
A response means for the generative AI to respond when a call is received from an unregistered caller or a call is received from a withheld caller;
A method of transmitting the contents of a call using voice recognition technology to convert the contents of the call into text, summarizing and documenting the converted text using generative AI , and sending it via a messenger app or email.
When a call is received, the AI receives the secret code used by the delivery company and confirms that it is the correct code. If so, it will connect to the user .
and a linking means for, when a signal transmitted through a communication network includes caller information during a call, checking the telephone number indicated by the caller information , referring to a database that lists numbers with a history of misuse and that is included in the system, and notifying the police if the incoming number is a number with a history of misuse ;
The response means is a system that estimates the emotions of the caller from the tone of voice and manner of speaking, and responds according to the estimation result .

2. The system according to claim 1, wherein the sending means sends the summary to a family member when sending the summary via a messenger app or by email.

2. The system according to claim 1, wherein the linking means notifies an emergency contact designated by the user when reporting to the police.

2. The system according to claim 1, wherein the connection means performs authentication by having a delivery company scan a QR code or an NFC tag, and connects the call directly to the user .

A system according to any one of claims 1 to 4, characterized in that the sending means has a function of highlighting important keywords and phrases in the document in addition to documenting the contents of the call.

A system according to any one of claims 1 to 4,
an emotion recognition means for estimating an emotion of the other party based on the converted text;
and an adjustment means for adjusting the documentation of the call content based on the emotion of the call partner estimated by the emotion recognition means.