JP2023155684A

JP2023155684A - Program, information processing device and method

Info

Publication number: JP2023155684A
Application number: JP2022065159A
Authority: JP
Inventors: 健一郎阿部; Kenichiro Abe; 勝敏石川; Katsutoshi Ishikawa; 正稔三枝; Masatoshi Saegusa; 秀成神酒; Hidenari Miki; 光治松生; Mitsuharu Matsuo
Original assignee: ARP CO Ltd
Current assignee: ARP CO Ltd
Priority date: 2022-04-11
Filing date: 2022-04-11
Publication date: 2023-10-23
Anticipated expiration: 2042-04-11
Also published as: JP2023155890A; JP7254316B1

Abstract

To achieve voice authentication with high safety at high speed.SOLUTION: A program to be executed by a computer including a processor and a memory, causes the processor to execute the steps of: extracting a first voice feature amount from first voice data of a pre-registered user; generating a temporary password; presenting the password to the user; receiving an input of second voice data of the user reading out the password; extracting a second voice feature amount from the received second voice data; and executing user authentication using the first voice feature amount, the second voice feature amount and the password, where the first voice feature amount and the second voice feature amount are voice features represented by vectors.SELECTED DRAWING: Figure 7

Description

本開示は、プログラム、情報処理装置、及び方法に関する。 The present disclosure relates to a program, an information processing device, and a method.

安全で確実な二重身分認証を実現することを目的として、事前にユーザ声紋モデルを学習・登録し、動的パスワードを生成し、動的パスワードを読み上げた時のパスワード音声信号に基づいて、グローバルキャラクター音響モデルとユーザ声紋モデルで当該要求者の身分総合信頼度を算出し、算出した前記身分総合信頼度に基づいて当該要求者の身分を判定する、という技術がある（特許文献１）。 In order to achieve safe and reliable dual identity authentication, we learn and register a user voiceprint model in advance, generate a dynamic password, and create a global There is a technique that calculates the overall identity reliability of the requester using a character acoustic model and a user voiceprint model, and determines the identity of the requester based on the calculated overall identity reliability (Patent Document 1).

特表２０１８―５０９６４９号公報Special table 2018-509649 publication

しかし、従来技術では、グローバルな声紋モデルを学習しておく必要があったり、信頼度にビタビアルゴリズムなどの従来の音声認識に用いる計算方法を用いたり、平均信頼値を算出しているため、処理速度が遅い、という問題があった。 However, with conventional technology, it is necessary to learn a global voiceprint model, use a calculation method used for conventional speech recognition such as the Viterbi algorithm for reliability, or calculate an average reliability value, which makes processing difficult. The problem was that it was slow.

本開示の目的は、安全性の高い音声認証を高速に実現することにある。 An object of the present disclosure is to quickly realize highly secure voice authentication.

本開示の一態様のプログラムは、プロセッサと、メモリとを備えるコンピュータに実行させるためのプログラムであって、前記プログラムは、前記プロセッサに、予め登録されたユーザの第１音声データから、第１の音声特徴量を抽出するステップと、一時的なパスワードを生成するステップと、前記ユーザに、前記パスワードを提示するステップと、前記ユーザが前記パスワードを読み上げた第２音声データの入力を受け付けるステップと、受け付けた前記第２音声データから、第２の音声特徴量を抽出するステップと、前記第１の音声特徴量と、前記第２の音声特徴量と、前記パスワードとを用いて、ユーザ認証を行うステップと、を実行させ、前記第１の音声特徴量と、前記第２の音声特徴量とは、ベクトルで表される音声特徴である。 A program according to an aspect of the present disclosure is a program to be executed by a computer including a processor and a memory, and the program causes the processor to select first voice data of a user registered in advance. a step of extracting a voice feature amount, a step of generating a temporary password, a step of presenting the password to the user, a step of receiving input of second voice data in which the user reads out the password; extracting a second audio feature from the received second audio data; and performing user authentication using the first audio feature, the second audio feature, and the password. The first audio feature amount and the second audio feature amount are audio features represented by vectors.

本開示によれば、安全性の高い音声認証を高速に実現することができる。 According to the present disclosure, highly secure voice authentication can be realized at high speed.

本開示の情報処理システム１の構成を示すブロック図である。FIG. 1 is a block diagram showing the configuration of an information processing system 1 of the present disclosure. 本開示の情報処理装置１０の構成を示すブロック図である。FIG. 1 is a block diagram showing the configuration of an information processing device 10 of the present disclosure. 本開示の情報処理装置１０の機能構成を示すブロック図である。1 is a block diagram showing a functional configuration of an information processing device 10 of the present disclosure. FIG. 本開示の第１音声データ収集処理を示すフローチャートである。It is a flowchart which shows the 1st voice data collection process of this indication. 本開示の学習処理を示すフローチャートである。It is a flowchart which shows learning processing of this indication. 本開示の第１の音声特徴量抽出処理を示すフローチャートである。It is a flowchart which shows the 1st voice feature quantity extraction process of this indication. 本開示の認証処理を示すフローチャートである。2 is a flowchart showing authentication processing according to the present disclosure. 本開示の認証処理Ｓ４０５のユーザ認証処理を示すフローチャートである。It is a flowchart which shows user authentication processing of authentication processing S405 of this indication. 本開示の情報処理システム２の構成を示すブロック図である。FIG. 1 is a block diagram showing the configuration of an information processing system 2 of the present disclosure. 本開示の認証処理を示すフローチャートである。2 is a flowchart showing authentication processing according to the present disclosure. 本開示の情報処理システム３の構成を示すブロック図である。FIG. 1 is a block diagram showing the configuration of an information processing system 3 of the present disclosure. 本開示の認証処理を示すフローチャートである。2 is a flowchart showing authentication processing according to the present disclosure. 本開示の情報処理システム４の構成を示すブロック図である。1 is a block diagram showing the configuration of an information processing system 4 of the present disclosure. FIG. 本開示の認証処理を示すフローチャートである。2 is a flowchart showing authentication processing according to the present disclosure. 本開示の情報処理システム５の構成を示すブロック図である。1 is a block diagram showing the configuration of an information processing system 5 of the present disclosure. FIG. 本開示の認証処理を示すフローチャートである。2 is a flowchart showing authentication processing according to the present disclosure. 本開示の情報処理システム６の構成を示すブロック図である。1 is a block diagram showing the configuration of an information processing system 6 of the present disclosure. FIG. 本開示の認証処理を示すフローチャートである。2 is a flowchart showing authentication processing according to the present disclosure.

以下、本開示の一実施形態について、図面に基づいて詳細に説明する。なお、実施形態
を説明するための図面において、同一の構成要素には原則として同一の符号を付し、その
繰り返しの説明は省略する。 Hereinafter, one embodiment of the present disclosure will be described in detail based on the drawings. In addition, in the drawings for explaining the embodiments, the same components are generally designated by the same reference numerals, and repeated explanations thereof will be omitted.

従来技術では、グローバルな声紋モデルを学習しておく必要があったり、信頼度にビタビアルゴリズムなどの従来の音声認識に用いる計算方法を用いたり、平均信頼値を算出しているため、処理速度が遅い、という問題があった。 Conventional technology requires learning a global voiceprint model, uses a calculation method used for conventional speech recognition such as the Viterbi algorithm for reliability, and calculates an average reliability value, which slows down processing speed. The problem was that it was slow.

本開示の技術は、予め登録されたユーザの第１音声データから、第１の音声特徴量を抽出し、一時的なパスワードを生成し、ユーザに、当該パスワードを提示する。そして、ユーザが当該パスワードを読み上げた第２音声データから、第２の音声特徴量を抽出し、第１の音声特徴量と、第２の音声特徴量と、当該パスワードとを用いて、ユーザ認証を行う。また、第１の音声特徴量と、第２の音声特徴量とは、ベクトルで表される音声特徴である。これにより、本開示は、安全性の高い音声認証を高速に実現することができる技術を開示する。 The technology of the present disclosure extracts a first voice feature amount from first voice data of a user registered in advance, generates a temporary password, and presents the password to the user. Then, a second audio feature is extracted from the second audio data in which the user reads out the password, and the user is authenticated using the first audio feature, the second audio feature, and the password. I do. Further, the first audio feature amount and the second audio feature amount are audio features expressed by vectors. Accordingly, the present disclosure discloses a technology that can quickly realize highly secure voice authentication.

また、従来技術は、処理速度が遅く、またサービスの提供者、サービスの利用者等にとって利用しづらい、という問題があった。例えば、ホテル等の宿泊サービスにおいて、フロントに人手を介してチェックインする必要があったり、ユーザが認証することを意識させたくない場合にも、ユーザが認証したことを気付いてしまう場合があったりする問題があった。本開示は、利用シーンに応じて利便性の高い音声認証技術を開示する。
以下、第１実施形態では、本開示の音声認証技術について説明する。また、第２実施形態～第６実施形態では、利用シーンに応じた利便性の高い音声認証技術の具体例について説明する。 Further, the conventional technology has problems in that the processing speed is slow and it is difficult for service providers, service users, etc. to use the technology. For example, in accommodation services such as hotels, it may be necessary to check in manually at the front desk, or even if the user does not want to be aware of the authentication process, the user may notice that the user has authenticated. There was a problem. The present disclosure discloses a voice authentication technology that is highly convenient depending on the usage scene.
In the first embodiment, the voice authentication technology of the present disclosure will be described below. Furthermore, in the second to sixth embodiments, specific examples of highly convenient voice authentication techniques according to usage scenes will be described.

＜第１実施形態＞
（１）情報処理システム１の構成
図１は、第１実施形態の情報処理システム１の構成を示すブロック図である。図１に示すように、情報処理システム１は、情報処理装置１０、ユーザ端末２０、及びネットワーク３０を含む。情報処理装置１０と、ユーザ端末２０とは、有線又は無線の通信規格を用いて、ネットワーク３０を介して相互に通信可能に接続されている。 <First embodiment>
(1) Configuration of information processing system 1 FIG. 1 is a block diagram showing the configuration of information processing system 1 according to the first embodiment. As shown in FIG. 1, the information processing system 1 includes an information processing device 10, a user terminal 20, and a network 30. The information processing device 10 and the user terminal 20 are connected to be able to communicate with each other via a network 30 using a wired or wireless communication standard.

情報処理装置１０は、据え置き型のＰＣ（ＰｅｒｓｏｎａｌＣｏｍｐｕｔｅｒ）、ラップトップＰＣなどにより実現される。 The information processing device 10 is realized by a stationary PC (Personal Computer), a laptop PC, or the like.

図２は、第１実施形態の情報処理装置１０の構成を示すブロック図である。図２に示すように、情報処理装置１０は、記憶装置１１、プロセッサ１２、入出力インターフェース１３、及び通信インターフェース１４を備える。 FIG. 2 is a block diagram showing the configuration of the information processing device 10 of the first embodiment. As shown in FIG. 2, the information processing device 10 includes a storage device 11, a processor 12, an input/output interface 13, and a communication interface 14.

記憶装置１１は、プログラム、及び、プログラム等で処理されるデータ等を一時的に記憶する装置である。記憶装置１１は、例えば、フラッシュメモリ、ＤＲＡＭ（ＤｙｎａｍｉｃＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）等のメモリ、ＨＤＤ（ＨａｒｄＤｉｓｃＤｒｉｖｅ）、ＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）等の１つ、又は組み合わせにより実現される。 The storage device 11 is a device that temporarily stores programs, data processed by the programs, and the like. The storage device 11 is realized by one or a combination of a flash memory, a memory such as a DRAM (Dynamic Random Access Memory), an HDD (Hard Disc Drive), an SSD (Solid State Drive), and the like.

プロセッサ１２は、プログラムに記述された命令セットを実行するためのハードウェアであり、演算装置、レジスタ、周辺回路などにより構成される。 The processor 12 is hardware for executing a set of instructions written in a program, and is composed of an arithmetic unit, registers, peripheral circuits, and the like.

入出力インターフェース１３は、図示しない入力装置（例えば、マイク、タッチパネル、タッチパッド、マウス等のポインティングデバイス、キーボード等）から、入力信号を受け付けるインターフェースである。また、入出力インターフェース１３は、図示しない出力装置（ディスプレイ、スピーカ等）に対し、出力信号を送信するインターフェースである。 The input/output interface 13 is an interface that receives input signals from an input device (for example, a microphone, a touch panel, a touch pad, a pointing device such as a mouse, a keyboard, etc.) that is not shown. Further, the input/output interface 13 is an interface that transmits an output signal to an output device (not shown) (display, speaker, etc.).

通信インターフェース１４は、情報処理装置１０が外部の装置と通信するため、信号を入出力するためのインターフェースである。 The communication interface 14 is an interface for inputting and outputting signals so that the information processing device 10 communicates with an external device.

ユーザ端末２０は、ユーザにより操作され、またはユーザのために操作される端末装置である。ユーザは、例えば、サービスの利用者などである。ユーザにより操作される場合、ユーザ端末２０は、ユーザが保有する端末装置であるか、又はサービスの提供者によりユーザに供与され端末であって、当該提供者が保有する端末装置である。ユーザのために操作される端末装置である場合、ユーザ端末２０は、サービスの提供者が保有する端末装置である。 The user terminal 20 is a terminal device operated by or for the user. The user is, for example, a user of a service. When operated by a user, the user terminal 20 is a terminal device owned by the user, or a terminal device provided to the user by a service provider and owned by the provider. When the user terminal 20 is a terminal device operated for a user, the user terminal 20 is a terminal device owned by a service provider.

ユーザ端末２０は、例えば、移動体通信システムに対応したスマートフォン、タブレット等の携帯端末、ウェアラブルデバイス等により実現される。この他に、ユーザ端末２０は、据え置き型のＰＣ（ＰｅｒｓｏｎａｌＣｏｍｐｕｔｅｒ）、ラップトップＰＣなどであるとしてもよい。本開示では、ユーザ端末２０がスマートフォンである場合を例に説明する。 The user terminal 20 is realized by, for example, a smartphone compatible with a mobile communication system, a mobile terminal such as a tablet, a wearable device, or the like. In addition, the user terminal 20 may be a stationary PC (Personal Computer), a laptop PC, or the like. In this disclosure, a case where the user terminal 20 is a smartphone will be described as an example.

図２は、ユーザ端末２０の構成を示すブロック図である。図２に示すように、ユーザ端末２０は、記憶装置２１、プロセッサ２２、入出力インターフェース２３、及び通信インターフェース２４を備える。また、ユーザ端末２０は、図示しないディスプレイ、スピーカーなどの出力装置を備える。 FIG. 2 is a block diagram showing the configuration of the user terminal 20. As shown in FIG. 2, the user terminal 20 includes a storage device 21, a processor 22, an input/output interface 23, and a communication interface 24. Further, the user terminal 20 includes an output device such as a display and a speaker (not shown).

記憶装置２１は、プログラム、及び、プログラム等で処理されるデータ等を一時的に記憶する装置である。記憶装置１１は、例えば、フラッシュメモリ、ＤＲＡＭ（ＤｙｎａｍｉｃＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）等のメモリ、ＨＤＤ（ＨａｒｄＤｉｓｃＤｒｉｖｅ）、ＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）等の１つ、又は組み合わせにより実現される。 The storage device 21 is a device that temporarily stores programs and data processed by the programs. The storage device 11 is realized by one or a combination of a flash memory, a memory such as a DRAM (Dynamic Random Access Memory), an HDD (Hard Disc Drive), an SSD (Solid State Drive), and the like.

プロセッサ２２は、プログラムに記述された命令セットを実行するためのハードウェアであり、演算装置、レジスタ、周辺回路などにより構成される。 The processor 22 is hardware for executing a set of instructions written in a program, and is composed of an arithmetic unit, registers, peripheral circuits, and the like.

入出力インターフェース２３は、図示しない入力装置（例えば、マイク、タッチパネル、タッチパッド、マウス等のポインティングデバイス、キーボード等）から、入力信号を受け付けるインターフェースである。また、入出力インターフェース１３は、図示しない出力装置（ディスプレイ、スピーカ等）に対し、出力信号を送信するインターフェースである。 The input/output interface 23 is an interface that receives input signals from an input device (for example, a microphone, a touch panel, a touch pad, a pointing device such as a mouse, a keyboard, etc.) that is not shown. Further, the input/output interface 13 is an interface that transmits an output signal to an output device (not shown) (display, speaker, etc.).

通信インターフェース２４は、情報処理装置１０が外部の装置と通信するため、信号を入出力するためのインターフェースである。 The communication interface 24 is an interface for inputting and outputting signals so that the information processing device 10 communicates with an external device.

（２）情報処理装置１０の機能
図３は、第１実施形態の情報処理装置１０の機能構成を示すブロック図である。図３に示すように、情報処理装置１０は、通信部１１０、記憶部１２０、及び制御部１３０を含む。 (2) Functions of the information processing device 10 FIG. 3 is a block diagram showing the functional configuration of the information processing device 10 of the first embodiment. As shown in FIG. 3, the information processing device 10 includes a communication section 110, a storage section 120, and a control section 130.

通信部１１０は、情報処理装置１０が外部の装置と通信するための処理を行う。 The communication unit 110 performs processing for the information processing device 10 to communicate with an external device.

記憶部１２０は、情報処理装置１０が使用するデータ及びプログラムを記憶する。記憶部１２０は、第１データＤＢ１２１、第２データＤＢ１２２、第３データＤＢ１２３等を記憶する。 The storage unit 120 stores data and programs used by the information processing device 10. The storage unit 120 stores a first data DB 121, a second data DB 122, a third data DB 123, and the like.

第１データＤＢ１２１は、第１音声データを保持するためのデータベースである。例えば、第１データＤＢ１２１は、項目「ＩＤ」、項目「ユーザＩＤ」、項目「第１音声データ」などのレコードを含む。なお、ここに示す項目は全てではなく、他の項目があっても構わない。 The first data DB 121 is a database for holding first audio data. For example, the first data DB 121 includes records such as item "ID", item "user ID", and item "first audio data". Note that the items shown here are not all, and there may be other items.

項目「ＩＤ」は、各レコードを識別するための情報を記憶する。 The item "ID" stores information for identifying each record.

項目「ユーザＩＤ」は、ユーザを識別するための情報を記憶する。なお、ユーザＩＤは、本開示の他のＤＢにおいても同様である。 The item "user ID" stores information for identifying a user. Note that the user ID is the same in other DBs of the present disclosure.

項目「第１音声データ」は、ユーザが発生した音声データを記憶する。音声データは、例えばＷａｖなどの音声ファイルで表されるデータである。 The item "first audio data" stores audio data generated by the user. The audio data is, for example, data represented by an audio file such as WAV.

第２データＤＢ１２２は、学習済みモデル及び学習済みモデルのパラメータを保持するデータベースである。学習済みモデルについては、後述する。 The second data DB 122 is a database that holds learned models and parameters of the learned models. The trained model will be described later.

第３データＤＢ１２３は、後述するユーザの第１の音声特徴量を保持するためのデータベースである。例えば、第３データＤＢ１２３は、項目「ユーザＩＤ」、項目「ユーザ名」、項目「音声特徴量」、項目「更新日時」などのレコードを含む。なお、ここに示す項目は全てではなく、他の項目があっても構わない。 The third data DB 123 is a database for holding the user's first voice feature amount, which will be described later. For example, the third data DB 123 includes records such as the item "user ID", the item "user name", the item "audio feature amount", and the item "update date and time". Note that the items shown here are not all, and there may be other items.

項目「ユーザ名」は、ユーザの名、氏、氏名、名称、通称などの情報を記憶する。 The item "user name" stores information such as the user's first name, last name, full name, name, and nickname.

項目「音声特徴量」は、抽出したユーザの第１の音声特徴量を記憶する。第１の音声特徴量については後述する。 The item "voice feature amount" stores the extracted first voice feature amount of the user. The first audio feature amount will be described later.

項目「更新日時」は、第１の音声特徴量を第３データＤＢ１２３に格納した日時を保持する。 The item "update date and time" holds the date and time when the first audio feature amount was stored in the third data DB 123.

制御部１３０は、情報処理装置１０のプロセッサ１２がプログラムに従って処理を行うことにより、受信制御部１３１、送信制御部１３２、抽出部１３３、生成部１３４、提示部１３５、及び認証部１３６などに示す機能を発揮する。 The control unit 130 performs processing according to a program by the processor 12 of the information processing device 10 to provide information to the reception control unit 131, transmission control unit 132, extraction unit 133, generation unit 134, presentation unit 135, authentication unit 136, etc. Demonstrate function.

受信制御部１３１は、情報処理装置１０が外部の装置から通信プロトコルに従って信号を受信する処理を制御する。例えば、受信制御部１３１は、ユーザ端末２０から後述のパスワードを読み上げた第２音声データを受信すると、当該第２音声データを、抽出部１３３に当該音声データを渡す。 The reception control unit 131 controls a process in which the information processing device 10 receives a signal from an external device according to a communication protocol. For example, upon receiving second voice data in which a password, which will be described later, is read out from the user terminal 20, the reception control unit 131 passes the second voice data to the extraction unit 133.

送信制御部１３２は、情報処理装置１０が外部の装置に対し通信プロトコルに従って信号を送信する処理を制御する。 The transmission control unit 132 controls a process in which the information processing device 10 transmits a signal to an external device according to a communication protocol.

抽出部１３３は、音声データから、音声特徴量を抽出する。
具体的には、抽出部１３３は、学習済みモデルと、第１データＤＢ１２１に登録されたユーザの第１音声データを１以上とを用いて、音声特徴量を抽出する。 The extraction unit 133 extracts voice features from the voice data.
Specifically, the extraction unit 133 extracts the voice feature amount using the learned model and one or more first voice data of the user registered in the first data DB 121.

ここで、本開示の学習済みモデルについて説明する。当該学習済みモデルは、音声データを入力することに応じて、ユーザの音声特徴をベクトルで表す音声特徴量を出力するように学習されたモデルである。学習済みモデルは、任意の機械学習モデル、任意のニューラルネットワークなどを用いることができる。 Here, the learned model of the present disclosure will be explained. The learned model is a model that has been trained to output a voice feature amount representing a user's voice feature as a vector in response to input voice data. As the trained model, any machine learning model, any neural network, etc. can be used.

本開示では、学習済みモデルは、深層距離学習モデル（ＤｅｅｐＭｅｔｒｉｃＬｅｒｎｉｎｇモデル）である場合を例に説明する。この場合、学習済みモデルは、音声データを２次元の特徴量に変換した音声特徴量を入力すると、Ｎ次元のベクトルで表される音声特徴量を出力するように学習される。当該音声特徴量は、例えば、音声のメル周波数ケプストラム係数（ＭＦＣＣ）などである。学習済みモデルの学習は、情報処理装置１０が行っても、他の装置が行ってもよい。本開示では、学習済みモデルが他の装置によって学習が行われ、予め第２データＤＢ１２２に格納されている場合を例に説明する。 In this disclosure, a case will be described in which the learned model is a deep metric learning model. In this case, the trained model is trained to output a voice feature represented by an N-dimensional vector when input with a voice feature obtained by converting voice data into a two-dimensional feature. The audio feature amount is, for example, a mel frequency cepstral coefficient (MFCC) of the audio. The learned model may be trained by the information processing device 10 or by another device. In the present disclosure, an example will be described in which a trained model is trained by another device and stored in the second data DB 122 in advance.

学習済みモデルの学習に用いられる学習データは、学習用に録音した複数人の音声データを、音声信号処理及び音響処理の少なくとも１以上を施したものを予め用意しておいたものである。音声信号処理及び音響処理は、例えば、音量調節、音声の伸縮、ピッチシフト、ノイズ印加、イコライザー、リバーブなどの処理である。ある人の音声データを、複数の音声信号処理及び音響処理を施すことで、ある人の音声データから、複数の音声データを生成することができる。このような音声信号処理及び音響処理は、音声データの録音環境の違いを考慮するために行う。このような学習データにより、当該学習済みモデルは、マイクの性能の差異などの録音環境の影響を少なくした音声特徴量の抽出を実現することができる。 The learning data used for learning the trained model is prepared in advance by performing at least one of audio signal processing and acoustic processing on voice data of a plurality of people recorded for learning. The audio signal processing and acoustic processing include, for example, volume adjustment, audio expansion/contraction, pitch shift, noise application, equalizer, reverb, and the like. A plurality of pieces of audio data can be generated from a person's voice data by subjecting the person's voice data to a plurality of audio signal processes and acoustic processing. Such audio signal processing and audio processing are performed to take into account differences in the recording environment of audio data. With such training data, the trained model can realize the extraction of audio features with less influence of the recording environment such as differences in microphone performance.

また、処理された学習データは、メル周波数ケプストラム係数（ＭＦＣＣ）を用いて、２次元ベクトルで表される特徴量に変換される。当該特徴量は、例えば１２８×１２８の２次元で表される。 Further, the processed learning data is converted into a feature expressed by a two-dimensional vector using Mel frequency cepstral coefficients (MFCC). The feature amount is expressed, for example, in two dimensions of 128×128.

そして、学習済みモデルは、当該２次元ベクトルで表される特徴量を入力とし、Ｎ次元のベクトルで表される音声特徴量を出力するように、学習される。Ｎは、任意の整数である。Ｎ次元のベクトルで表される音声特徴量は、人がどのような音声の特徴を有するかを要素とするベクトルである。学習方法は、例えば、距離学習を用いる。これにより、当該学習済みモデルは、人毎の音声特徴を示すように、Ｎ次元のベクトルで表される音声特徴量を出力するように学習される。このように出力された音声特徴量は、ベクトル表現であるため、他の同様に出力された音声特徴量と距離により、同一人物であるか否かを精度よく判定することができる。このように学習された学習済みモデルが、第２データＤＢ１２２に格納されている。 The trained model is then trained to input the feature amount represented by the two-dimensional vector and output the audio feature amount represented by the N-dimensional vector. N is any integer. The voice feature quantity represented by an N-dimensional vector is a vector whose elements are what kind of voice characteristics a person has. The learning method uses distance learning, for example. As a result, the trained model is trained to output a voice feature quantity represented by an N-dimensional vector so as to indicate the voice characteristics of each person. Since the audio feature amount output in this way is a vector representation, it is possible to accurately determine whether or not they are the same person based on the distance from other similarly output audio feature amounts. The learned model learned in this way is stored in the second data DB 122.

なお、学習データは、敵対的ノイズを加えたものでもよい。当該学習済みモデルは、学習済みモデルの学習時に敵対的学習を行うことで、敵対的攻撃（なりすまし等）への耐性をもたせることができる。 Note that the learning data may include adversarial noise. The learned model can be made resistant to adversarial attacks (such as spoofing) by performing adversarial learning during learning of the trained model.

具体的には、抽出部１３３は、ユーザの第１音声データを、第１データＤＢ１２１から取得する。次に、抽出部１３３は、第１音声データを、メル周波数ケプストラム係数（ＭＦＣＣ）を用いて、音声データを２次元ベクトルで表される第３の音声特徴量に変換する。次に、抽出部１３３は、第２データＤＢ１２２から、学習済みモデルを取得する。次に、抽出部１３３は、第３の音声特徴量と、学習済みモデルとを用いて、Ｎ次元のベクトルで表される第１の音声特徴量を抽出する。そして、抽出部１３３は、抽出した第１の音声特徴量を、第３データＤＢ１２３に格納する。 Specifically, the extraction unit 133 acquires the user's first voice data from the first data DB 121. Next, the extraction unit 133 converts the first audio data into a third audio feature represented by a two-dimensional vector using Mel frequency cepstral coefficients (MFCC). Next, the extraction unit 133 acquires the learned model from the second data DB 122. Next, the extraction unit 133 uses the third audio feature and the learned model to extract the first audio feature represented by an N-dimensional vector. Then, the extraction unit 133 stores the extracted first voice feature amount in the third data DB 123.

また、抽出部１３３は、ユーザが後述のパスワードを読み上げた音声である第２音声データを受信すると、第２音声データを、メル周波数ケプストラム係数を用いて、音声データを２次元ベクトルで表される第４の音声特徴量に変換する。次に、抽出部１３３は、第４の音声特徴量と、学習済みモデルとを用いて、Ｎ次元のベクトルで表される第２の音声特徴量を抽出する。 Further, when the extraction unit 133 receives second voice data that is the voice of the user reading out the password described below, the extraction unit 133 converts the second voice data into a two-dimensional vector using Mel frequency cepstral coefficients. It is converted into a fourth audio feature amount. Next, the extraction unit 133 uses the fourth voice feature and the learned model to extract a second voice feature expressed by an N-dimensional vector.

生成部１３４は、一時的なパスワードを生成する。 The generation unit 134 generates a temporary password.

具体的には、生成部１３４は、所定の有効期限を定めたパスワードを生成する。パスワードは、読み上げることが可能な文字列である。生成部１３４は、ランダムな文字列としてパスワードを生成しても、予め決められた単語又は文の中からランダム又は所定の方式により選択したものをパスワードとして生成してもよい。パスワードがランダムな文字列である場合、人が読み上げることが困難である可能性がある。このため、生成部１３４が、予め読み上げやすい単語又は文からパスワードを選択する方が、音声を用いるユーザ認証の精度が高くなる。 Specifically, the generation unit 134 generates a password with a predetermined expiration date. The password is a character string that can be read aloud. The generating unit 134 may generate the password as a random character string, or may generate the password as a password selected from predetermined words or sentences at random or in a predetermined manner. If the password is a random string of characters, it may be difficult for a person to read it out loud. Therefore, if the generation unit 134 selects a password in advance from words or sentences that are easy to read out, the accuracy of user authentication using voice becomes higher.

提示部１３５は、ユーザに、パスワードを提示する。 The presentation unit 135 presents the password to the user.

具体的には、提示部１３５は、パスワードを、人が知覚可能であり、知覚した結果として声に出すことが可能な態様で、ユーザにパスワードを提示する。提示部１３５は、例えば、パスワードを文字列として出力装置に表示させる、パスワードを認知可能な画像若しくは映像として出力装置に表示させる、又は、パスワードに関する音として出力装置に発音させる。画像又は映像として表示させる場合では、提示部１３５は、例えばパスワードが「ほくとしちせい」である場合、北斗七星の映った画像又は映像を出力装置（例えばディスプレイ）に表示させる。また、音として発音させる場合では、提示部１３５は、例えばパスワードが「ほくとしちせい」である場合、それを読み上げた音を任意の手法で生成して、出力装置（例えばスピーカー）に発音させる。 Specifically, the presenting unit 135 presents the password to the user in a manner that is perceivable by a person and can be uttered out loud as a result of the percept. For example, the presentation unit 135 causes the output device to display the password as a character string, displays the password as a recognizable image or video, or causes the output device to produce a sound related to the password. In the case of displaying the image or video, the presentation unit 135 displays the image or video of the Big Dipper on an output device (for example, a display) if the password is "Hokutoshichisei", for example. In addition, in the case where the password is pronounced as a sound, for example, if the password is "Hokutoshichisei", the presentation unit 135 generates the sound read out by an arbitrary method and causes the output device (for example, a speaker) to pronounce the sound. .

また、ユーザが情報処理装置１０に接続される出力装置の近くにいない場合、提示部１３５は、通信を介して、ユーザ端末２０などにパスワードを送信することにより、ユーザにパスワードを提示すればよい。 Furthermore, if the user is not near the output device connected to the information processing device 10, the presenting unit 135 may present the password to the user by transmitting the password to the user terminal 20 or the like via communication. .

認証部１３６は、第１の音声特徴量と、第２の音声特徴量と、パスワードとを用いて、ユーザ認証を行う。 The authentication unit 136 performs user authentication using the first audio feature amount, the second audio feature amount, and the password.

具体的には、認証部１３６は、パスワードが有効期限内である場合に、パスワード認証と、話者認証とを行い、何れの認証も成功した場合、ユーザ認証に成功したものとする。パスワード認証は、第２音声データが、当該パスワードについて読み上げられた音声データであることの認証を行うものである。話者認証は、第２音声データが、登録されたユーザにより発声されたものであることの認証を行うものである。 Specifically, the authentication unit 136 performs password authentication and speaker authentication when the password is within the expiration date, and if both authentications are successful, it is determined that the user authentication has been successful. Password authentication is to authenticate that the second audio data is audio data that is read aloud regarding the password. Speaker authentication is to authenticate that the second voice data is uttered by a registered user.

まず、認証部１３６は、受信した第２音声データを、テキストデータに変換する。次に、認証部１３６は、パスワードと、テキストデータとを用いて、パスワード認証を行う。より具体的には、認証部１３６は、第２音声データを変換したテキストデータと、パスワードとが一致するか否かを検証する。認証部１３６は、テキストデータとパスワードとが一致する場合、パスワード認証が成功したものと判定する。一方、認証部１３６は、テキストデータとパスワードとが一致しない場合、パスワード認証が成功しなかったものと判定する。 First, the authentication unit 136 converts the received second audio data into text data. Next, the authentication unit 136 performs password authentication using the password and text data. More specifically, the authentication unit 136 verifies whether the text data obtained by converting the second audio data and the password match. If the text data and the password match, the authentication unit 136 determines that the password authentication is successful. On the other hand, if the text data and the password do not match, the authentication unit 136 determines that the password authentication has not been successful.

また、認証部１３６は、第１の音声特徴量と第２の音声特徴量との距離に応じて、話者認証を行う。より具体的には、認証部１３６は、まず、第１の音声特徴量と、第２の音声特徴量との距離を算出する。次に、認証部１３６は、算出した距離が、所定の閾値以下であるか否かを判定する。認証部１３６は、距離が所定の閾値以下である場合、話者認証に成功したものと判定する。一方、認証部１３６は、距離が所定の閾値以下でない場合、話者認証に成功しなかったものと判定する。 Further, the authentication unit 136 performs speaker authentication according to the distance between the first voice feature amount and the second voice feature amount. More specifically, the authentication unit 136 first calculates the distance between the first audio feature amount and the second audio feature amount. Next, the authentication unit 136 determines whether the calculated distance is less than or equal to a predetermined threshold. If the distance is less than or equal to a predetermined threshold, the authentication unit 136 determines that speaker authentication has been successful. On the other hand, if the distance is not less than a predetermined threshold, the authentication unit 136 determines that speaker authentication has not been successful.

認証部１３６は、パスワードが有効期限内であり、パスワード認証と、話者認証とが共に成功した場合に、ユーザ認証に成功したものとする。一方、認証部１３６は、パスワードが有効期限内でない、パスワード認証に成功していない、又は話者認証に成功していない場合、ユーザ認証に成功しなかったものとする。 The authentication unit 136 determines that user authentication has been successful if the password is within the expiration date and both password authentication and speaker authentication are successful. On the other hand, if the password is not within the expiration date, password authentication is not successful, or speaker authentication is not successful, the authentication unit 136 determines that user authentication has not been successful.

なお、認証部１３６は、第１の音声特徴量と第２の音声特徴量との距離そのものを用いなくてもよい。例えば、認証部１３６は、第１の音声特徴量を用いて学習されたユーザの異常検知モデルと、第２の音声特徴量とを用いて、ユーザ認証を用いてもよい。ユーザの異常検知モデルは、音声特徴量を入力することにより、当該音声特徴量が当該ユーザの音声特徴量であるか否かを出力するモデルである。異常検知モデルは、例えば、ＯｎｅＣｌａｓｓＳＶＭ、ＩｓｏｌａｔｉｏｎＦｏｒｅｓｔなどの既存のモデルを用いることができる。なお、異常検知モデルは、ユーザ登録時の音声情報を用いて、ユーザ毎に学習されている。 Note that the authentication unit 136 does not need to use the distance itself between the first audio feature amount and the second audio feature amount. For example, the authentication unit 136 may perform user authentication using the user's anomaly detection model learned using the first voice feature and the second voice feature. The user's anomaly detection model is a model that receives a voice feature as input and outputs whether or not the voice feature is the voice feature of the user. As the abnormality detection model, for example, existing models such as OneClassSVM and IsolationForest can be used. Note that the anomaly detection model is learned for each user using voice information at the time of user registration.

そして、認証部１３６は、認証結果を出力する。認証結果の出力先は、例えば、情報処理装置１０に接続された出力装置、ユーザ端末２０に送信、サービスを実施するためのサーバ等である。 The authentication unit 136 then outputs the authentication result. The output destination of the authentication result is, for example, an output device connected to the information processing device 10, a server for transmitting the authentication result to the user terminal 20, and implementing a service.

（３）動作
以下では、情報処理装置１０における処理について図面を参照しながら説明する。 (3) Operation In the following, processing in the information processing device 10 will be described with reference to the drawings.

図４は、情報処理装置１０による第１音声データ収集処理を行う流れの一例を示すフローチャートである。情報処理装置１０は、当該処理を、任意のタイミング（例えば、第１音声データを受信したタイミングなど）において実行する。 FIG. 4 is a flowchart illustrating an example of the flow of the first audio data collection process performed by the information processing device 10. The information processing device 10 executes this process at an arbitrary timing (for example, at the timing when the first audio data is received).

ステップＳ１０１において、受信制御部１３１は、ユーザ端末２０から第１音声データを受信する。 In step S101, the reception control unit 131 receives first audio data from the user terminal 20.

ステップＳ１０２において、受信制御部１３１は、受信した第１音声データを、第１データＤＢ１２１に格納し、処理を終了する。 In step S102, the reception control unit 131 stores the received first audio data in the first data DB 121, and ends the process.

図５は、情報処理装置１０による学習処理を行う流れの一例を示すフローチャートである。当該処理を、任意のタイミング（例えば、学習処理を開始するための信号を受信したタイミングなど）において実行する。なお、情報処理装置１０が学習処理を行う場合、図示しない学習部が当該処理を行う。 FIG. 5 is a flowchart showing an example of the flow of learning processing performed by the information processing device 10. As shown in FIG. The process is executed at an arbitrary timing (for example, at the timing when a signal for starting the learning process is received). Note that when the information processing device 10 performs learning processing, a learning unit (not shown) performs the processing.

ステップＳ２０１において、学習部は、学習データを取得する。この場合、学習部は、学習データを、記憶部１２０又は外部のデータベースから取得する。 In step S201, the learning unit acquires learning data. In this case, the learning section acquires the learning data from the storage section 120 or an external database.

ステップＳ２０２において、学習部は、音声データを入力することに応じて、ユーザの音声特徴をベクトルで表す音声特徴量を出力するように、モデルを学習する。 In step S202, the learning unit learns the model so as to output a voice feature amount representing the user's voice feature as a vector in response to input of voice data.

ステップＳ２０３において、学習部は、学習済みモデルとそのパラメータとを、第２データＤＢ１２２に格納し、処理を終了する。 In step S203, the learning unit stores the learned model and its parameters in the second data DB 122, and ends the process.

図６は、情報処理装置１０による第１の音声特徴量抽出処理を行う流れの一例を示すフローチャートである。情報処理装置１０は、当該処理を、任意のタイミング（例えば、ユーザの第１音声データを格納したタイミング、認証要求信号を受信したタイミングなど）において実行する。 FIG. 6 is a flowchart illustrating an example of the flow of the first audio feature amount extraction process performed by the information processing device 10. The information processing device 10 executes this process at an arbitrary timing (for example, the timing when the user's first voice data is stored, the timing when the authentication request signal is received, etc.).

ステップＳ３０１において、抽出部１３３は、ユーザの第１音声データを、第１データＤＢ１２１から取得する。 In step S301, the extraction unit 133 acquires the user's first voice data from the first data DB 121.

ステップＳ３０２において、抽出部１３３は、第１音声データを、メル周波数ケプストラム係数（ＭＦＣＣ）を用いて、音声データを２次元ベクトルで表される第３の音声特徴量に変換する。 In step S302, the extraction unit 133 converts the first audio data into a third audio feature represented by a two-dimensional vector using Mel frequency cepstral coefficients (MFCC).

ステップＳ３０３において、抽出部１３３は、第２データＤＢ１２２から、学習済みモデルを取得する。 In step S303, the extraction unit 133 acquires the learned model from the second data DB 122.

ステップＳ３０４において、抽出部１３３は、第３の音声特徴量と、学習済みモデルとを用いて、Ｎ次元のベクトルで表される第１の音声特徴量を抽出する。 In step S304, the extraction unit 133 uses the third audio feature and the trained model to extract the first audio feature expressed by an N-dimensional vector.

ステップＳ３０５において、抽出部１３３は、抽出した第１の音声特徴量を、第３データＤＢ１２３に格納し、処理を終了する。 In step S305, the extraction unit 133 stores the extracted first voice feature amount in the third data DB 123, and ends the process.

図７は、情報処理装置１０によるユーザ認証処理を行う流れの一例を示すフローチャートである。情報処理装置１０は、当該処理を、任意のタイミング（例えば、認証要求信号を受信したタイミングなど）において実行する。 FIG. 7 is a flowchart illustrating an example of the flow of user authentication processing performed by the information processing apparatus 10. The information processing device 10 executes this process at an arbitrary timing (for example, at the timing when the authentication request signal is received).

ステップＳ４０１において、受信制御部１３１は、認証要求信号を受信する。認証要求信号は、例えば、情報処理装置１０に接続される入力端末、ユーザ端末２０などから入力を受け付ける。 In step S401, the reception control unit 131 receives an authentication request signal. The authentication request signal receives input from, for example, an input terminal connected to the information processing device 10, the user terminal 20, or the like.

ステップＳ４０２において、生成部１３４は、一時的なパスワードを生成する。 In step S402, the generation unit 134 generates a temporary password.

ステップＳ４０３において、提示部１３５は、ユーザに、パスワードを提示する。 In step S403, the presenting unit 135 presents the password to the user.

ステップＳ４０４において、受信制御部１３１は、第２音声データを受信する。 In step S404, the reception control unit 131 receives the second audio data.

ステップＳ４０５において、認証部１３６は、第１の音声特徴量と、第２の音声特徴量と、パスワードとを用いて、ユーザ認証を行う。 In step S405, the authentication unit 136 performs user authentication using the first audio feature, the second audio feature, and the password.

ステップＳ４０６において、認証部１３６は、認証結果を出力し、処理を終了する。 In step S406, the authentication unit 136 outputs the authentication result and ends the process.

図８は、ステップＳ４０５の認証部１３６による認証処理を行う流れの一例を示すフローチャートである。 FIG. 8 is a flowchart illustrating an example of the flow of authentication processing performed by the authentication unit 136 in step S405.

ステップＳ４５１において、抽出部１３３は、第２音声データを、メル周波数ケプストラム係数を用いて、音声データを２次元ベクトルで表される第４の音声特徴量に変換する。 In step S451, the extraction unit 133 converts the second audio data into a fourth audio feature represented by a two-dimensional vector using Mel frequency cepstral coefficients.

ステップＳ４５２において、抽出部１３３は、第４の音声特徴量と、学習済みモデルとを用いて、Ｎ次元のベクトルで表される第２の音声特徴量を抽出する。 In step S452, the extraction unit 133 uses the fourth voice feature and the trained model to extract a second voice feature expressed by an N-dimensional vector.

ステップＳ４６３において、認証部１３６は、認証要求信号に係るユーザの第１の音声特徴量を、第３データＤＢ１２３から取得する。 In step S463, the authentication unit 136 acquires the first voice feature amount of the user related to the authentication request signal from the third data DB 123.

ステップＳ４６４において、認証部１３６は、受信した第２音声データを、テキストデータに変換する。 In step S464, the authentication unit 136 converts the received second audio data into text data.

ステップＳ４６５において、認証部１３６は、パスワードと、テキストデータとを用いて、パスワード認証を行う。 In step S465, the authentication unit 136 performs password authentication using the password and text data.

ステップＳ４６６において、認証部１３６は、第１の音声特徴量と第２の音声特徴量との距離に応じて、話者認証を行う。 In step S466, the authentication unit 136 performs speaker authentication according to the distance between the first voice feature and the second voice feature.

ステップＳ４６７において、認証部１３６は、パスワードが有効期限内であり、かつ、パスワード認証と、話者認証とが共に成功した場合に、ユーザ認証に成功したものとし、それ以外の場合ユーザ認証に成功していないものとし、認証結果をリターンする。 In step S467, the authentication unit 136 determines that user authentication is successful if the password is within the expiration date and both password authentication and speaker authentication are successful; otherwise, user authentication is successful. It is assumed that this has not been done, and the authentication result is returned.

なお、上記処理は、個別の処理として説明したが、これに限定されるものではない。例えば、情報処理システム１において、上記の処理を組み合わせて実行してもよい。 Note that although the above processing has been described as individual processing, it is not limited to this. For example, the information processing system 1 may execute a combination of the above processes.

（４）小括
以上説明したように、本開示によれば、予め登録されたユーザの第１音声データから、第１の音声特徴量を抽出し、一時的なパスワードを生成し、ユーザに、当該パスワードを提示する。そして、ユーザが当該パスワードを読み上げた第２音声データから、第２の音声特徴量を抽出し、第１の音声特徴量と、第２の音声特徴量と、当該パスワードとを用いて、ユーザ認証を行う。また、第１の音声特徴量と、第２の音声特徴量とは、ベクトルで表される音声特徴である。これにより、本開示は、安全性の高い音声認証を高速に実現することができる技術を開示する。 (4) Summary As explained above, according to the present disclosure, the first voice feature is extracted from the user's first voice data registered in advance, a temporary password is generated, and the user is given the following information: Present the password. Then, a second audio feature is extracted from the second audio data in which the user reads out the password, and the user is authenticated using the first audio feature, the second audio feature, and the password. I do. Further, the first audio feature amount and the second audio feature amount are audio features expressed by vectors. Accordingly, the present disclosure discloses a technology that can quickly realize highly secure voice authentication.

学習済みモデルが、深層距離学習モデルのような畳み込みニューラルネットワークであり、第１の音声特徴量と第２の音声特徴量とが、所定のＮ次元のベクトルで表される。これにより、第１の音声特徴量と第２の音声特徴量との距離を算出するという簡易な計算を採用することができるため、安全性の高い音声認証を高速に実現することができる。 The trained model is a convolutional neural network such as a deep distance learning model, and the first audio feature amount and the second audio feature amount are represented by a predetermined N-dimensional vector. As a result, it is possible to employ a simple calculation of calculating the distance between the first voice feature amount and the second voice feature amount, so that highly secure voice authentication can be realized at high speed.

なお、パスワードは、生成部により生成される場合を例に説明したが、これに限定されるものではない。パスワードを生成した端末から取得する構成としてもよい。他の実施形態においても同様である。 Note that although the password has been described using an example where the password is generated by the generation unit, the password is not limited to this. A configuration may also be adopted in which the password is acquired from the terminal that generated it. The same applies to other embodiments.

＜第２実施形態＞
第２実施形態では、上記ユーザ認証を、サービスの利用時の認証に用いる例を説明する。第２実施形態では、サービスが、サービス提供者の施設をユーザに提供する場合について説明する。なお、第１実施形態と同様の構成については、同一の符号を付して説明を省略する。 <Second embodiment>
In the second embodiment, an example will be described in which the above user authentication is used for authentication when using a service. In the second embodiment, a case will be described in which a service provides facilities of a service provider to a user. Note that the same configurations as those in the first embodiment are given the same reference numerals and the description thereof will be omitted.

（１）情報処理システム２の構成
図９は、第２実施形態の情報処理システム２の構成を示すブロック図である。 (1) Configuration of the information processing system 2 FIG. 9 is a block diagram showing the configuration of the information processing system 2 of the second embodiment.

図９に示すように、情報処理システム２は、情報処理装置１０、ユーザ端末２０、ネットワーク３０、及び施設４０を含む。情報処理装置１０と、ユーザ端末２０と、施設４０とは、有線又は無線の通信規格を用いて、ネットワーク３０を介して相互に通信可能に接続されている。 As shown in FIG. 9, the information processing system 2 includes an information processing device 10, a user terminal 20, a network 30, and a facility 40. The information processing device 10, the user terminal 20, and the facility 40 are connected to be able to communicate with each other via the network 30 using a wired or wireless communication standard.

施設４０は、サービス提供者によりユーザに提供される施設である。施設４０は、例えば、スポーツジム、プール、入浴施設、オフィス、宿泊施設等である。施設４０は、所定の設備に設置された情報処理装置４１と、音声入力装置４２とを含む。情報処理装置４１と、音声入力装置４２とは、有線又は無線の通信規格を用いて相互に通信可能に接続されている。所定の設備は、例えば、施設４０がスポーツジムであれば、トレーニングルーム、トレーニングマシンなどである。また、情報処理装置４１は、有線又は無線の通信規格を用いてネットワーク３０を介して、情報処理装置１０などと通信可能に接続されている。 Facility 40 is a facility provided to users by a service provider. The facility 40 is, for example, a sports gym, a pool, a bathing facility, an office, an accommodation facility, or the like. The facility 40 includes an information processing device 41 and a voice input device 42 installed in a predetermined facility. The information processing device 41 and the voice input device 42 are connected to be able to communicate with each other using a wired or wireless communication standard. For example, if the facility 40 is a sports gym, the predetermined equipment is a training room, a training machine, etc. Further, the information processing device 41 is communicably connected to the information processing device 10 and the like via the network 30 using a wired or wireless communication standard.

情報処理装置４１は、例えば、ディスプレイ付きの情報処理装置である。情報処理装置４１は、以下の機能を有する。
・情報処理装置４１の前に人がいるか否かを、赤外線等を使って感知する機能。
・情報処理装置４１の前に人がいることを検知した場合、情報処理装置１０と通信することにより、パスワードを取得する機能。
・パスワードを、情報処理装置４１に接続された出力装置（例えばディスプレイ）に出力する機能。
・音声入力装置４２から取得した第２音声データを、情報処理装置１０に送信する機能。
・情報処理装置１０から、後述の許可情報を受信したことに応じて、所定の設備の解錠又はサービスを提供することに関する装置の起動を実行する機能。 The information processing device 41 is, for example, an information processing device with a display. The information processing device 41 has the following functions.
- A function that detects whether there is a person in front of the information processing device 41 using infrared rays or the like.
- A function of acquiring a password by communicating with the information processing device 10 when it is detected that there is a person in front of the information processing device 41.
- A function to output the password to an output device (for example, a display) connected to the information processing device 41.
- A function of transmitting the second audio data acquired from the audio input device 42 to the information processing device 10.
- A function of unlocking a predetermined facility or activating a device related to providing a service in response to receiving permission information, which will be described later, from the information processing device 10.

音声入力装置４２は、ユーザが第２音声データを入力するための装置である。例えば、音声入力装置４２は、マイクを保有し、マイクに入力された音声を、第２音声データに変換する。音声入力装置４２は、第２音声データを情報処理装置４１に渡す。 The audio input device 42 is a device for the user to input second audio data. For example, the audio input device 42 has a microphone and converts audio input into the microphone into second audio data. The audio input device 42 passes the second audio data to the information processing device 41.

（２）情報処理装置１０の機能 (2) Functions of the information processing device 10

本実施形態において、第１データＤＢ１２１に格納される第１音声データは、ユーザがサービスの利用を開始する前において取得したものである。例えば、サービスがスポーツジムである場合、ユーザとサービス提供者が会員契約を締結したタイミングなどで、第１音声データを取得する。 In this embodiment, the first audio data stored in the first data DB 121 is acquired before the user starts using the service. For example, if the service is a gym, the first audio data is acquired at the timing when the user and the service provider conclude a membership contract.

提示部１３５は、ユーザがサービスの利用する際に、当該ユーザに、パスワードを提示する。具体的には、提示部１３５は、受信制御部１３１が、情報処理装置４１からパスワード要求を受信すると、情報処理装置４１にパスワードを送信する。これにより、提示部１３５は、施設４０内の所定の設備に設置された情報処理装置により、ユーザにパスワードを提示する。 The presentation unit 135 presents a password to the user when the user uses the service. Specifically, when the reception control unit 131 receives a password request from the information processing device 41, the presentation unit 135 transmits the password to the information processing device 41. Thereby, the presenting unit 135 presents the password to the user using an information processing device installed in a predetermined equipment within the facility 40.

認証部１３６は、受信制御部１３１が、情報処理装置４１から第２音声データを受信すると、第１の音声特徴量と、第２の音声特徴量と、パスワードとを用いて、ユーザ認証を行う。認証部１３６は、ユーザ認証に成功したことに応じて、当該サービスの利用を許可する。具体的には、認証部１３６は、ユーザ認証に成功した場合、所定の設備の解錠又はサービスを提供することに関する装置の起動を実行する許可情報を、情報処理装置に送信する。 When the reception control unit 131 receives the second audio data from the information processing device 41, the authentication unit 136 performs user authentication using the first audio feature amount, the second audio feature amount, and the password. . The authentication unit 136 permits use of the service in response to successful user authentication. Specifically, if the authentication unit 136 succeeds in user authentication, the authentication unit 136 transmits permission information for unlocking a predetermined facility or activating a device related to providing a service to the information processing device.

図１０は、情報処理装置１０による認証処理を行う流れの一例を示すフローチャートである。情報処理装置１０は、当該処理を、情報処理装置４１からパスワード要求を受信したタイミング等において実行する。 FIG. 10 is a flowchart illustrating an example of the flow of authentication processing performed by the information processing apparatus 10. The information processing device 10 executes this process at the timing when the password request is received from the information processing device 41, etc.

ステップＳ５０１において、受信制御部１３１は、情報処理装置４１からパスワード要求を受信する。 In step S501, the reception control unit 131 receives a password request from the information processing device 41.

ステップＳ５０３において、提示部１３５は、ユーザがサービスの利用する際に、当該ユーザに、パスワードを提示する。 In step S503, the presenting unit 135 presents a password to the user when the user uses the service.

ステップＳ５０６において、認証部１３６は、ユーザ認証に成功した場合、所定の設備の解錠又はサービスを提供することに関する装置の起動を実行する許可情報を、情報処理装置４１に送信し、処理を終了する。 In step S506, if the user authentication is successful, the authentication unit 136 transmits permission information for unlocking a predetermined facility or activating a device related to providing a service to the information processing device 41, and ends the process. do.

（４）小括
本開示によれば、ユーザ認証に成功したことに応じて、前記サービスの利用を許可し、第１音声データは、ユーザがサービスの利用を開始する前において取得したものであり、ユーザがサービスの利用する際に、ユーザに、パスワードを提示する。これにより、利用シーンに応じて利便性の高い音声認証を実現することができる。例えば、本開示は、ユーザ認証を音声のみで行うことができる。このため、ユーザが手荷物などで手がふさがっている場合、物理的なキーを持ちたくない、預けたくない、若しくは渡したくない場合、物理的なキーを送りたくない若しくは複数作りたくない場合などの問題を解消することができる。また、本開示は、ユーザがパスワードを覚えなくてもよいため、ユーザにとって利便性が高い。 (4) Summary According to the present disclosure, use of the service is permitted in response to successful user authentication, and the first audio data is obtained before the user starts using the service. , when the user uses the service, the password is presented to the user. Thereby, highly convenient voice authentication can be realized depending on the usage scene. For example, in the present disclosure, user authentication can be performed using only voice. For this reason, if the user is busy with baggage, does not want to carry a physical key, does not want to check it in, or does not want to give it to the user, does not want to send a physical key, or does not want to make multiple physical keys, etc. can solve the problem. Furthermore, the present disclosure is highly convenient for the user because the user does not have to remember a password.

また、施設内の所定の設備に設置された情報処理装置により、ユーザにパスワードを提示し、設備に設置された音声入力装置を通じて、第２音声データの入力を受け付ける。そして、ユーザ認証に成功したことに応じて、設備の解錠又は前記サービスを提供することに関する装置の起動を実行する。これにより、サービス提供者が受け付けスタッフなどを配備しなくても、ユーザが設備内に入れたり、装置が起動して利用可能になったりする。このため、サービス提供者にとっても負担を減らすことができる。 Further, the password is presented to the user by the information processing device installed in a predetermined equipment within the facility, and the input of the second audio data is accepted through the audio input device installed in the equipment. Then, in response to successful user authentication, the device unlocks the facility or activates the device related to providing the service. This allows the user to enter the facility and the device to be activated and available for use without the need for the service provider to deploy reception staff or the like. Therefore, the burden on the service provider can also be reduced.

＜第３実施形態＞
第３実施形態では、上記ユーザ認証を、サービスの利用時の認証に用いる例を説明する。第３実施形態では、サービスが、宿泊施設の利用の提供である場合について説明する。なお、第１実施形態及び第２実施形態と同様の構成については、同一の符号を付して説明を省略する。 <Third embodiment>
In the third embodiment, an example will be described in which the above user authentication is used for authentication when using a service. In the third embodiment, a case will be described in which the service is provision of use of accommodation facilities. Note that the same configurations as those in the first embodiment and the second embodiment are given the same reference numerals, and the description thereof will be omitted.

（１）情報処理システム３の構成
図１１は、第３実施形態の情報処理システム３の構成を示すブロック図である。 (1) Configuration of information processing system 3 FIG. 11 is a block diagram showing the configuration of information processing system 3 according to the third embodiment.

図１１に示すように、情報処理システム２は、情報処理装置１０、ユーザ端末２０、ネットワーク３０、宿泊施設５０、第１のサーバ６０、及び第２のサーバ７０を含む。情報処理装置１０と、ユーザ端末２０と、宿泊施設５０と、第１のサーバ６０と、第２のサーバ７０とは、有線又は無線の通信規格を用いて、ネットワーク３０を介して相互に通信可能に接続されている。 As shown in FIG. 11, the information processing system 2 includes an information processing device 10, a user terminal 20, a network 30, an accommodation facility 50, a first server 60, and a second server 70. The information processing device 10, the user terminal 20, the accommodation facility 50, the first server 60, and the second server 70 can communicate with each other via the network 30 using wired or wireless communication standards. It is connected to the.

宿泊施設５０は、サービス提供者によりユーザに提供される宿泊施設である。宿泊施設５０は、例えば、ホテル、旅館などである。宿泊施設５０は、所定の設備に設置された情報処理装置４１と、音声入力装置４２とを含む。所定の設備は、例えば、宿泊施設のフロントに設置されるチェックイン端末などである。 The accommodation facility 50 is an accommodation facility provided to the user by a service provider. The accommodation facility 50 is, for example, a hotel, an inn, or the like. The accommodation facility 50 includes an information processing device 41 and a voice input device 42 installed in predetermined equipment. The predetermined equipment is, for example, a check-in terminal installed at the front desk of an accommodation facility.

第１のサーバ６０は、旅行代理店のサーバ、又は、旅行代理店の提供するＷｅｂシステムを実行するサーバである。なお、第１のサーバ６０は、宿泊施設のサーバ、又は宿泊施設の提供するＷｅｂシステムを実行するサーバであってもよい。以下、第１のサーバ６０が、旅行代理店のサーバである場合を例に説明する。 The first server 60 is a travel agency server or a server that executes a web system provided by the travel agency. Note that the first server 60 may be a server of an accommodation facility or a server that executes a web system provided by the accommodation facility. Hereinafter, a case where the first server 60 is a server of a travel agency will be described as an example.

ユーザが旅行代理店において、少なくとも宿泊施設５０の利用を含む旅行の契約した際に、旅行代理店において、ユーザの第１音声データを取得する。例えば、旅行代理店は、旅行代理店に設置された音声入力装置により、第１音声データを取得する。そして、旅行代理店の担当者が、取得した第１音声データを、第１のサーバ６０に送信する。 When the user signs a contract at a travel agency for a trip that includes at least the use of the accommodation facility 50, the travel agency acquires the user's first voice data. For example, a travel agency acquires the first voice data using a voice input device installed at the travel agency. Then, the person in charge at the travel agency transmits the acquired first voice data to the first server 60.

第１のサーバ６０は、情報処理装置１０の要求に応じて、又は自動的に、情報処理装置１０に第１音声データと、宿泊期間に関する情報とを送信する。また、第１のサーバ６０は、第２のサーバ７０に、ユーザの情報及び宿泊施設５０の利用に関する情報を送信する。 The first server 60 transmits the first audio data and information regarding the accommodation period to the information processing device 10 in response to a request from the information processing device 10 or automatically. Further, the first server 60 transmits user information and information regarding the use of the accommodation facility 50 to the second server 70 .

なお、第１のサーバ６０がＷｅｂシステムである場合、第１のサーバ６０は、ユーザ端末２０から、ユーザの第１音声データを取得すればよい。 Note that if the first server 60 is a Web system, the first server 60 may acquire the user's first voice data from the user terminal 20.

第２のサーバ７０は、宿泊施設５０の利用を管理するサーバである。具体的には、第２のサーバ７０は、ユーザのチェックイン状況などを管理する。第２のサーバ７０は、情報処理装置１０から、チェックインしたことを示す情報を受信すると、ユーザがチェックインしたことを登録する。 The second server 70 is a server that manages the use of the accommodation facility 50. Specifically, the second server 70 manages the user's check-in status and the like. When the second server 70 receives information indicating that the user has checked in from the information processing device 10, it registers that the user has checked in.

本実施形態において、第１データＤＢ１２１に格納される第１音声データは、ユーザがサービスの利用を開始する前において取得したものである。例えば、ユーザが旅行代理店を介してサービス提供者と宿泊契約を締結したタイミングなどで、第１のサーバ６０から、第１音声データを取得する。 In this embodiment, the first audio data stored in the first data DB 121 is acquired before the user starts using the service. For example, the first audio data is acquired from the first server 60 at the timing when the user concludes an accommodation contract with a service provider via a travel agency.

受信制御部１３１は、第１のサーバから、第１音声データ、宿泊期間に関する情報などを取得する。 The reception control unit 131 acquires the first audio data, information regarding the accommodation period, etc. from the first server.

生成部１３４は、受信した宿泊期間に有効なパスワードを生成する。 The generation unit 134 generates a password valid for the received accommodation period.

提示部１３５は、ユーザがサービスの利用する際に、宿泊施設５０内の所定の設備に設置された情報処理装置４１により、当該ユーザにパスワードを提示する。具体的には、提示部１３５は、受信制御部１３１が、情報処理装置４１からパスワード要求を受信すると、情報処理装置４１にパスワードを送信する。これにより、提示部１３５は、宿泊施設５０内の所定の設備に設置された情報処理装置により、ユーザにパスワードを提示する。 The presenting unit 135 presents a password to the user using the information processing device 41 installed in a predetermined facility within the accommodation facility 50 when the user uses the service. Specifically, when the reception control unit 131 receives a password request from the information processing device 41, the presentation unit 135 transmits the password to the information processing device 41. Thereby, the presenting unit 135 presents the password to the user using an information processing device installed in a predetermined facility within the accommodation facility 50.

認証部１３６は、受信制御部１３１が、情報処理装置４１から第２音声データを受信すると、第１の音声特徴量と、第２の音声特徴量と、パスワードとを用いて、ユーザ認証を行う。認証部１３６は、ユーザ認証に成功したことに応じて、ユーザ認証の認証結果を、第２のサーバに送信することにより、チェックインしたことを登録する。具体的には、認証部１３６は、ユーザ認証に成功した場合に、ユーザ認証の認証結果と、日時とを、第２のサーバ７０に送信する。これにより、認証部１３６は、第２のサーバ７０に、ユーザ認証の認証結果と、日時とを用いて、当該ユーザがチェックインしたことを登録させる。 When the reception control unit 131 receives the second audio data from the information processing device 41, the authentication unit 136 performs user authentication using the first audio feature amount, the second audio feature amount, and the password. . In response to successful user authentication, the authentication unit 136 registers check-in by transmitting the authentication result of the user authentication to the second server. Specifically, when the user authentication is successful, the authentication unit 136 transmits the user authentication result and date and time to the second server 70 . Thereby, the authentication unit 136 causes the second server 70 to register that the user has checked in using the authentication result of the user authentication and the date and time.

図１２は、情報処理装置１０による認証処理を行う流れの一例を示すフローチャートである。情報処理装置１０は、当該処理を、情報処理装置４１からパスワード要求を受信したタイミング等において実行する。 FIG. 12 is a flowchart illustrating an example of the flow of authentication processing performed by the information processing device 10. The information processing device 10 executes this process at the timing when the password request is received from the information processing device 41, etc.

ステップＳ６０２において、生成部１３４は、受信した宿泊期間に有効なパスワードを生成する。 In step S602, the generation unit 134 generates a password valid for the received accommodation period.

ステップＳ６０６において、認証部１３６は、ユーザ認証に成功した場合、ユーザ認証の認証結果を、第２のサーバに送信することにより、チェックインしたことを登録し、処理を終了する。 In step S606, if the user authentication is successful, the authentication unit 136 registers the check-in by transmitting the authentication result of the user authentication to the second server, and ends the process.

（４）小括
本開示によれば、第１のサーバから、宿泊期間に関する情報を取得し、宿泊期間に有効な前記パスワードを生成する。また、宿泊施設内の所定の設備に設置された情報処理装置により、ユーザにパスワードを提示し、当該設備に設置された音声入力装置を通じて、第２音声データの入力を受け付ける。そして、ユーザ認証に成功したことに応じて、前記ユーザ認証の認証結果を第２のサーバに送信することにより、チェックインしたことを登録する。サービスは、宿泊施設の利用の提供であり、第１音声データは、ユーザがサービスの利用を開始する前において、第１のサーバから取得したものである。第１のサーバは、宿泊施設若しくは旅行代理店のサーバ、又は、宿泊施設若しくは旅行代理店の提供するＷｅｂシステムであり、第２のサーバは、宿泊施設の利用を管理するサーバである。これにより、宿泊施設側でユーザの音声を学習する処理を行うことなく、ユーザの認証を行うことができる。また、宿泊施設が受け付けスタッフなどを配備しなくても、ユーザがチェックインすることができる。このため、サービス提供者にとっても人手不足や感染症予防対策など負担を減らすことができる。 (4) Summary According to the present disclosure, information regarding the accommodation period is acquired from the first server, and the password valid for the accommodation period is generated. Further, an information processing device installed in a predetermined facility within the accommodation facility presents a password to the user, and input of the second audio data is accepted through a voice input device installed in the facility. Then, in response to successful user authentication, check-in is registered by transmitting the authentication result of the user authentication to the second server. The service is the provision of accommodation facilities, and the first audio data is acquired from the first server before the user starts using the service. The first server is a server of an accommodation facility or a travel agency, or a web system provided by the accommodation facility or a travel agency, and the second server is a server that manages the use of the accommodation facility. As a result, the user can be authenticated without performing a process of learning the user's voice on the accommodation facility side. Moreover, the user can check in even if the accommodation facility does not have reception staff or the like. Therefore, the burden on service providers, such as labor shortages and measures to prevent infectious diseases, can be reduced.

なお、チェックインしたことを登録したことに応じて、物理的なキー、電子キー、又はパスワードを発行するようにしてもよい。具体的には、認証部１３６は、情報処理装置４１にユーザ認証の成功を送信する。情報処理装置４１は、ユーザ認証の成功を受信すると、物理的なキーを保管するロッカーを解錠、電子的なキーをユーザ端末２０に送信する、施設内の設備に必要なパスワードをユーザに提示する、などにより、ユーザにキーを提供する。また、ロッカーの解錠などにおいて、チェックインと同じ音声認証を行う構成とすれば、物理的なキーを発行せずに認証を行うことができる。 Note that a physical key, electronic key, or password may be issued in response to registration of check-in. Specifically, the authentication unit 136 transmits the success of user authentication to the information processing device 41. Upon receiving successful user authentication, the information processing device 41 unlocks the locker storing the physical key, sends the electronic key to the user terminal 20, and presents the user with a password necessary for equipment in the facility. Provide the key to the user, such as by doing so. Furthermore, if a configuration is adopted in which voice authentication is performed in the same way as check-in when unlocking a locker, etc., authentication can be performed without issuing a physical key.

このような構成により、自動チェックインにより、宿泊施設の利用に関するキーが自動的に発行されることになる。このため、サービス提供者は、人手によるチェックイン等をする必要がなくなり、人手不足を解消することができる。また、このような構成によれば、人手を介さないため、感染病の予防を実行することができる。 With such a configuration, a key related to the use of the accommodation facility is automatically issued through automatic check-in. Therefore, the service provider does not need to perform manual check-in, etc., and can solve the problem of labor shortage. Moreover, according to such a configuration, since no manual intervention is required, it is possible to prevent infectious diseases.

また、家族・同一グループ内での施設を利用する際に物理的なキーを提供する場合、物理的なキーの保持者が限られるため、施設内での行動が、保持者の行動に左右されてしまう。しかし、本開示の技術は、各利用者にチェックイン認証成功に紐づいた開錠権限を与えることができる。このため、本開示の技術は、物理的な負担を宿泊施設及びユーザに与えず、かつ、宿泊施設内の利用者毎の行動自由度が向上することができる。また、本開示の技術は、物理的なキーの紛失や破損など物理損失のリスクを減少することができる。 In addition, when physical keys are provided when family members or members of the same group use a facility, there are a limited number of holders of the physical keys, so actions within the facility may be influenced by the holders' actions. It ends up. However, the technology of the present disclosure can give each user unlocking authority linked to successful check-in authentication. Therefore, the technology of the present disclosure does not impose a physical burden on the accommodation facility and the users, and can improve the degree of freedom of movement of each user within the accommodation facility. Furthermore, the technology of the present disclosure can reduce the risk of physical loss such as loss or damage to a physical key.

また、本実施形態では、第２のサーバ７０が、チェックインしたことを示す情報を、情報処理装置１０から取得する場合を例に説明したが、これに限定されない。例えば、第２のサーバ７０は、チェックインしたことを示す情報を、宿泊施設５０のチェックインに用いる端末（例えば、情報処理装置４１）から受信してもよい。 Further, in the present embodiment, the second server 70 acquires information indicating that the user has checked in from the information processing device 10, but the present invention is not limited thereto. For example, the second server 70 may receive information indicating that the user has checked in from a terminal (for example, the information processing device 41) used for checking in to the accommodation facility 50.

また、本実施形態ではチェックインの場合を例に説明したが、当然チェックアウトに用いることもできる。 Further, in this embodiment, the case of check-in has been described as an example, but of course it can also be used for check-out.

＜第４実施形態＞
第４実施形態では、上記ユーザ認証を、サービスの利用時の認証に用いる例を説明する。第４実施形態では、サービス提供者によるコールセンターにおける業務である場合について説明する。サービスは、例えば、修理の受け付け、クレジットカード利用明細の照会などである。なお、第１実施形態及び第２実施形態と同様の構成については、同一の符号を付して説明を省略する。 <Fourth embodiment>
In the fourth embodiment, an example will be described in which the above user authentication is used for authentication when using a service. In the fourth embodiment, a case will be described in which the work is performed by a service provider at a call center. The services include, for example, accepting repairs and inquiring about credit card usage details. Note that the same configurations as those in the first embodiment and the second embodiment are given the same reference numerals, and the description thereof will be omitted.

（１）情報処理システム４の構成
図１３は、第４実施形態の情報処理システム４の構成を示すブロック図である。 (1) Configuration of information processing system 4 FIG. 13 is a block diagram showing the configuration of information processing system 4 according to the fourth embodiment.

図１３に示すように、情報処理システム２は、情報処理装置１０、ユーザ端末２０、ネットワーク３０、及びコールセンター８０を含む。情報処理装置１０と、ユーザ端末２０と、コールセンター８０とは、有線又は無線の通信規格を用いて、ネットワーク３０を介して相互に通信可能に接続されている。 As shown in FIG. 13, the information processing system 2 includes an information processing device 10, a user terminal 20, a network 30, and a call center 80. The information processing device 10, the user terminal 20, and the call center 80 are connected to be able to communicate with each other via the network 30 using a wired or wireless communication standard.

コールセンター８０は、担当者が操作する情報処理装置８１を含んで構成される。情報処理装置８１は、以下の機能を有する。
・ユーザと通話する機能
・パスワードを、情報処理装置８１に接続された出力装置（例えばディスプレイ）に出力する機能。
・通話機能により取得した第２音声データを、情報処理装置１０に送信する機能。
・情報処理装置１０から、ユーザ認証の認証結果を受信したことを情報処理装置８１に接続された出力装置に出力する機能。 The call center 80 includes an information processing device 81 operated by a person in charge. The information processing device 81 has the following functions.
- A function to talk to a user - A function to output a password to an output device (for example, a display) connected to the information processing device 81.
- A function of transmitting the second audio data acquired by the telephone call function to the information processing device 10.
- A function of outputting the fact that the authentication result of user authentication has been received from the information processing device 10 to an output device connected to the information processing device 81.

（２）情報処理装置１０の機能
生成部１３４は、会話で用いるキーワードをパスワードとして生成すると共に、当該キーワードを答えさせる質問を生成する。具体的には、生成部１３４は、電話受付担当者がユーザと会話をする際に発生するようなキーワードを生成する。生成部１３４は、例えば、当該キーワードとして、コールセンター８０において管理しているユーザの個人情報に関する情報、会話のタイミングにおける気候情報などを生成する。また、生成部１３４は、生成したキーワードを答えさせる質問を生成する。生成部１３４は、例えば、キーワードの生成方法と質問とを予め紐づけて記憶しておくことで、生成したキーワードを答えさせる質問を選択する。 (2) Functions of the information processing device 10 The generation unit 134 generates a keyword used in a conversation as a password, and also generates a question to which the keyword is answered. Specifically, the generation unit 134 generates keywords that occur when a telephone receptionist has a conversation with a user. The generation unit 134 generates, for example, information regarding the user's personal information managed at the call center 80, climate information at the timing of the conversation, etc. as the keyword. The generation unit 134 also generates a question to which the generated keyword is answered. The generation unit 134 selects a question to which the generated keyword is answered by, for example, storing a keyword generation method and a question in advance in association with each other.

例えば、生成部１３４がキーワードを生成する方法が、ユーザの生年月日に含まれるキーワードであったとする。ユーザの生年月日が、例えば、２０００年４月１日であったとする。この場合、生成部１３４が、キーワードを、ユーザの生年月日の一部である「ねんしがつ」などとして生成することとなる。そして、生成部１３４は、質問として、「生年月日をお答えください」などとして生成することとなる。 For example, assume that the method by which the generation unit 134 generates a keyword is a keyword included in the user's date of birth. Assume that the user's date of birth is, for example, April 1, 2000. In this case, the generation unit 134 generates the keyword as "nenshigatsu", which is a part of the user's date of birth. Then, the generation unit 134 generates a question such as "Please tell us your date of birth."

提示部１３５は、サービスを提供する者の電話受付担当者にパスワードと、キーワードを答えさせる質問とを提示する。具体的には、提示部１３５は、受信制御部１３１が、情報処理装置８１からパスワード要求を受信すると、情報処理装置８１にパスワードと質問とを送信する。これにより、提示部１３５は、情報処理装置８１により、電話受付担当者にパスワードと質問とを提示する。電話受付担当者が、質問をユーザに投げかけることにより、情報処理装置８１は、第２音声データの入力を受け付けることができる。 The presentation unit 135 presents a password and a question to be answered with a keyword to the person in charge of telephone reception of the person providing the service. Specifically, when the reception control unit 131 receives a password request from the information processing device 81, the presentation unit 135 transmits the password and the question to the information processing device 81. Thereby, the presentation unit 135 uses the information processing device 81 to present the password and question to the telephone receptionist. When the telephone receptionist asks the user a question, the information processing device 81 can receive the input of the second voice data.

認証部１３６は、受信制御部１３１が、情報処理装置８１から第２音声データを受信すると、第１の音声特徴量と、第２の音声特徴量と、パスワードとを用いて、ユーザ認証を行う。そして、認証部１３６は、ユーザ認証の認証結果を、情報処理装置８１に送信する。これにより、情報処理装置８１が電話受付担当者に認証結果を表示することにより、電話受付担当者がユーザ認証を行うことができる。 When the reception control unit 131 receives the second audio data from the information processing device 81, the authentication unit 136 performs user authentication using the first audio feature amount, the second audio feature amount, and the password. . Then, the authentication unit 136 transmits the authentication result of the user authentication to the information processing device 81. Thereby, the information processing device 81 displays the authentication result to the person in charge of telephone reception, so that the person in charge of telephone reception can perform user authentication.

図１４は、情報処理装置１０による認証処理を行う流れの一例を示すフローチャートである。情報処理装置１０は、当該処理を、情報処理装置８１からパスワード要求を受信したタイミング等において実行する。 FIG. 14 is a flowchart illustrating an example of the flow of authentication processing performed by the information processing device 10. The information processing device 10 executes this process at the timing of receiving the password request from the information processing device 81, etc.

ステップＳ７０１において、受信制御部１３１は、情報処理装置８１からパスワード要求を受信する。 In step S701, the reception control unit 131 receives a password request from the information processing device 81.

ステップＳ７０２において、生成部１３４は、会話で用いるキーワードをパスワードとして生成すると共に、当該キーワードを答えさせる質問を生成する。 In step S702, the generation unit 134 generates a keyword used in the conversation as a password, and also generates a question to which the keyword is answered.

ステップＳ７０３において、提示部１３５は、サービスを提供する者の電話受付担当者にパスワードと、キーワードを答えさせる質問とを提示する。 In step S703, the presenting unit 135 presents a password and a question to be answered with a keyword to the person in charge of telephone reception of the person providing the service.

ステップＳ７０４において、認証部１３６は、ユーザ認証の認証結果を、情報処理装置８１に送信し、処理を終了する。 In step S704, the authentication unit 136 transmits the authentication result of the user authentication to the information processing device 81, and ends the process.

（４）小括
本開示によれば、会話で用いるキーワードをパスワードとして生成すると共に、キーワードを答えさせる質問を生成し、サービスを提供する者の電話受付担当者にパスワードと、キーワードを答えさせる質問とを提示する。そして、キーワードを答えさせる質問をすることにより、第２音声データの入力を受け付ける。これにより、コールセンターにおいて、本人情報を伝える手間を省くことができる。また、仮に本人情報が流出していたとしても、なりすましを防ぐことができる。また、ユーザに対してした質問からパスワードを導くため、ユーザに認証したことを意識させずに、ユーザ認証を行うことができる。
＜第５実施形態＞
第５実施形態では、上記ユーザ認証を、サービスの利用時の認証に用いる例を説明する。第５実施形態では、サービスが、宅配ロッカーの利用である場合について説明する。なお、第１実施形態及び第２実施形態と同様の構成については、同一の符号を付して説明を省略する。 (4) Summary According to the present disclosure, keywords used in a conversation are generated as passwords, questions are generated to be answered by the keywords, and questions to be asked to be answered by the telephone receptionist of the person providing the service with the passwords and keywords. and present. Then, input of second voice data is accepted by asking a question to which a keyword is answered. This saves the trouble of transmitting personal information at the call center. Furthermore, even if the user's personal information is leaked, identity theft can be prevented. Furthermore, since the password is derived from the questions asked of the user, user authentication can be performed without the user being aware that he or she has been authenticated.
<Fifth embodiment>
In the fifth embodiment, an example will be described in which the above user authentication is used for authentication when using a service. In the fifth embodiment, a case will be described in which the service is the use of a delivery locker. Note that the same configurations as those in the first embodiment and the second embodiment are given the same reference numerals, and the description thereof will be omitted.

（１）情報処理システム５の構成
図１５は、第５実施形態の情報処理システム５の構成を示すブロック図である。 (1) Configuration of the information processing system 5 FIG. 15 is a block diagram showing the configuration of the information processing system 5 of the fifth embodiment.

図１５に示すように、情報処理システム５は、情報処理装置１０、ユーザ端末２０、ネットワーク３０、配送担当者の端末９０、及び宅配ロッカー９１を含む。情報処理装置１０と、ユーザ端末２０と、配送担当者の端末９０とは、有線又は無線の通信規格を用いて、ネットワーク３０を介して相互に通信可能に接続されている。 As shown in FIG. 15, the information processing system 5 includes an information processing device 10, a user terminal 20, a network 30, a delivery person's terminal 90, and a delivery locker 91. The information processing device 10, the user terminal 20, and the delivery person's terminal 90 are connected to be able to communicate with each other via the network 30 using a wired or wireless communication standard.

配送担当者の端末９０は、配送担当者により操作される携帯端末である。配送担当者は、宅配ロッカーに荷物を入れると、端末９０に、文字列を入力する。文字列は、例えば、パスワードとして用いる文字列、荷物に関する情報の文字列などである。荷物に関する情報の文字列は、例えば、送付先の住所、送付先の電話番号、受け取り希望時間、配達日時、管理番号などである。本開示では、配送担当者が、文字列としてそのままパスワードとして用いる文字列を入力する場合を例に説明する。端末９０は、情報処理装置１０に、文字列を送信する。 The delivery person's terminal 90 is a mobile terminal operated by the delivery person. When the person in charge of delivery puts the package in the delivery locker, the person in charge of delivery inputs a character string into the terminal 90. The character string is, for example, a character string used as a password, a character string of information regarding luggage, or the like. The string of information regarding the package includes, for example, the address of the destination, the telephone number of the destination, the desired time for receiving the package, the date and time of delivery, and the management number. In the present disclosure, an example will be described in which a delivery person inputs a character string to be used as a password as is. The terminal 90 transmits a character string to the information processing device 10.

宅配ロッカー９１は、情報処理装置４１と、音声入力装置４２とを含んで構成される。情報処理装置４１は、ユーザ認証の認証結果が成功である場合に、宅配ロッカー９１を解錠する機能を有する。 The delivery locker 91 includes an information processing device 41 and a voice input device 42 . The information processing device 41 has a function of unlocking the delivery locker 91 when the authentication result of user authentication is successful.

（２）情報処理装置１０の機能
受信制御部１３１は、配送担当者の端末９０から一時的な文字列を受信する。 (2) Functions of the information processing device 10 The reception control unit 131 receives a temporary character string from the delivery person's terminal 90.

生成部１３４は、受信した文字列をパスワードとして生成する。なお、文字列をそのままパスワードとして用いない場合、生成部１３４は、文字列から所定の方法でパスワードを生成する。生成部１３４は、例えば、文字列を任意の変換方式で音読可能な文字列に変換する、文字列のハッシュ値を求め、それに対応する音読可能な文字列を組み合わせるなどの方法により、パスワードを生成する。 The generation unit 134 generates the received character string as a password. Note that if the character string is not used as it is as a password, the generation unit 134 generates a password from the character string using a predetermined method. The generation unit 134 generates a password by, for example, converting a character string into a readable character string using an arbitrary conversion method, calculating hash values of character strings, and combining the corresponding readable character strings. do.

認証部１３６は、ユーザ認証に成功したことに応じて、配送担当者により配送物を格納したロッカーを解錠する。具体的には、認証部１３６は、宅配ロッカー９１に、ユーザ認証の認証結果を送信する。 The authentication unit 136 unlocks the locker in which the delivery item is stored by the delivery person in response to successful user authentication. Specifically, the authentication unit 136 transmits the authentication result of the user authentication to the delivery locker 91.

図１６は、情報処理装置１０による認証処理を行う流れの一例を示すフローチャートである。情報処理装置１０は、当該処理を、任意のタイミング（例えば、ユーザが宅配ロッカー９１を操作したタイミングなど）において実行する。 FIG. 16 is a flowchart illustrating an example of the flow of authentication processing performed by the information processing device 10. The information processing device 10 executes this process at an arbitrary timing (for example, at the timing when the user operates the delivery locker 91, etc.).

ステップＳ８０２において、生成部１３４は、受信した文字列をパスワードとして生成する。 In step S802, the generation unit 134 generates the received character string as a password.

ステップＳ８０６において、認証部１３６は、宅配ロッカー９１に、ユーザ認証の認証結果を送信し、処理を終了する。 In step S806, the authentication unit 136 transmits the user authentication result to the delivery locker 91, and ends the process.

（４）小括
本開示によれば、配送担当者の端末から一時的な文字列を受信し、受信した文字列をパスワードとして生成し、ユーザ認証に成功したことに応じて、配送担当者が配送物を格納したロッカーを解錠する。これにより、配送担当者が宅配ロッカーのパスワードを設定する場合、不在票が不要となる。 (4) Summary According to the present disclosure, a temporary character string is received from a delivery person's terminal, the received character string is generated as a password, and in response to successful user authentication, the delivery person Unlock the locker containing the delivered items. This eliminates the need for an absentee slip when a delivery person sets a password for a delivery locker.

また、配送物が複数ある場合、複数の宅配ロッカーを使うことがある。複数の宅配ロッカーにおいて代表となる１つのパスワードを設定することにより、１回の認証でユーザは複数の宅配ロッカーを一斉に解錠することができる。
＜第６実施形態＞
第６実施形態では、上記ユーザ認証を、サービスの利用時の認証に用いる例を説明する。第６実施形態では、一般的なロッカーにおいて音声認証を行う場合について説明する。なお、第１実施形態と同様の構成については、同一の符号を付して説明を省略する。 Additionally, if there are multiple items to be delivered, multiple delivery lockers may be used. By setting one representative password for multiple delivery lockers, the user can unlock multiple delivery lockers at once with one authentication.
<Sixth embodiment>
In the sixth embodiment, an example will be described in which the above user authentication is used for authentication when using a service. In the sixth embodiment, a case where voice authentication is performed in a general locker will be described. Note that the same configurations as those in the first embodiment are given the same reference numerals and the description thereof will be omitted.

（１）情報処理システム６の構成
図１７は、第６実施形態の情報処理システム６の構成を示すブロック図である。 (1) Configuration of the information processing system 6 FIG. 17 is a block diagram showing the configuration of the information processing system 6 of the sixth embodiment.

図１７に示すように、情報処理システム６は、情報処理装置１０、ユーザ端末２０、ネットワーク３０、ロッカー９４を含む。情報処理装置１０と、ユーザ端末２０と、ロッカー９４とは、有線又は無線の通信規格を用いて、ネットワーク３０を介して相互に通信可能に接続されている。 As shown in FIG. 17, the information processing system 6 includes an information processing device 10, a user terminal 20, a network 30, and a locker 94. The information processing device 10, the user terminal 20, and the locker 94 are connected to be able to communicate with each other via the network 30 using a wired or wireless communication standard.

ロッカー９４は、情報処理装置４１及び音声入力装置４２と接続される。
情報処理装置４１は、更に、以下の機能を有する。
・情報処理装置１０から、解錠指示を受信したことに応じて、ロッカー９４の解錠を実行する機能。 The locker 94 is connected to the information processing device 41 and the voice input device 42 .
The information processing device 41 further has the following functions.
- A function of unlocking the locker 94 in response to receiving an unlocking instruction from the information processing device 10.

（２）情報処理装置１０の機能
受信制御部１３１は、情報処理装置４１から、認証要求を受信する。 (2) Functions of the information processing device 10 The reception control unit 131 receives an authentication request from the information processing device 41.

生成部１３４は、認証要求を受信したことに応じて、パスワードを生成する。 The generation unit 134 generates a password in response to receiving the authentication request.

認証部１３６は、ユーザ認証に成功したことに応じて、ユーザが利用するロッカーの解錠を実行する。具体的には、認証部１３６は、ユーザ認証に成功した場合、情報処理装置４１に、ロッカーの解錠指示を送信する。これにより、認証部１３６は、情報処理装置４１に、解錠指示に応じてロッカーの解錠を実行させる。 The authentication unit 136 unlocks the locker used by the user in response to successful user authentication. Specifically, when the user authentication is successful, the authentication unit 136 transmits an instruction to unlock the locker to the information processing device 41. Thereby, the authentication unit 136 causes the information processing device 41 to unlock the locker in response to the unlocking instruction.

図１８は、情報処理装置１０による認証処理を行う流れの一例を示すフローチャートである。情報処理装置１０は、当該処理を、情報処理装置４１からパスワード要求を受信したタイミング等において実行する。 FIG. 18 is a flowchart illustrating an example of the flow of authentication processing performed by the information processing device 10. The information processing device 10 executes this process at the timing when the password request is received from the information processing device 41, etc.

ステップＳ９０１において、受信制御部１３１は、情報処理装置４１から、認証要求を受信する。 In step S901, the reception control unit 131 receives an authentication request from the information processing device 41.

ステップＳ９０２において、生成部１３４は、認証要求を受信したことに応じて、パスワードを生成する。 In step S902, the generation unit 134 generates a password in response to receiving the authentication request.

ステップＳ９０６において、認証部１３６は、ユーザ認証に成功した場合、情報処理装置４１に、ロッカーの解錠指示を送信し、処理を終了する。 In step S906, if the user authentication is successful, the authentication unit 136 transmits an instruction to unlock the locker to the information processing device 41, and ends the process.

（４）小括
本開示によれば、他の端末から、認証要求を受け付け、認証要求を受け付けたことに応じて、パスワードを生成し、ユーザ認証に成功したことに応じて、ユーザが利用するロッカーの解錠を実行する。これにより、実際のキーを使わず、画像認識も行わずにロッカーを利用することができる。
例えば、温泉・プール・ジムなどで使うロッカーでは、肌露出が大きいことやプライバシーの観点から、画像認識が好ましくない場合がある。また、このようなロッカーでは、ユーザは一時的な実際のキーを常に所持せねばならず、煩わしかった。例えば、浴場内で、腕や足首にロッカーのキーを結びつけておく、などの行為をする必要があった。本開示によれば、このようなプライバシーの問題、及びユーザの煩わしさを解消することができる。 (4) Summary According to the present disclosure, an authentication request is received from another terminal, a password is generated in response to the acceptance of the authentication request, and a password is used by the user in response to successful user authentication. Execute unlocking of locker. This allows users to use lockers without using actual keys or image recognition.
For example, image recognition may be undesirable in lockers used at hot springs, pools, gyms, etc. due to the large amount of exposed skin and privacy concerns. Additionally, such lockers require the user to always carry a temporary physical key, which is cumbersome. For example, people had to do things like tie a locker key to their arm or ankle while in the bathhouse. According to the present disclosure, such privacy issues and user annoyance can be resolved.

＜変形例＞
以上、開示に係る実施形態について説明したが、これらはその他の様々な形態で実施することが可能であり、種々の省略、置換及び変更を行なって実施することができる。これらの実施形態及び変形例ならびに省略、置換及び変更を行なったものは、特許請求の範囲の技術的範囲とその均等の範囲に含まれる。 <Modified example>
Although the disclosed embodiments have been described above, they can be implemented in various other forms, and can be implemented with various omissions, substitutions, and changes. These embodiments and modifications, as well as omissions, substitutions, and changes, are included within the technical scope of the claims and their equivalents.

例えば、生成部１３４は、他の認証が成功したことを条件として、パスワードを生成する構成としてもよい。他の認証は、例えば、従来のパスワード認証、電話番号認証などの音声認証以外の認証である。この場合、受信制御部１３１は、他の認証が成功したことを受信する。生成部１３４は、他の認証の成功したことに応じて、パスワードを生成する。このように、他の認証と組み合わせることにより、本開示の技術は更にセキュリティ強度を高めることができる。また、本開示の認証技術は、他の認証と共に２段階認証に組み込むことにより、セキュリティ強度を高めることができる。特に、スマートフォンなどの携帯端末において、指紋認証や虹彩認証と組み合わせることで、ユーザが文字列を入力すること動作を経ることなく、安全性の高い認証を行うことができる。 For example, the generation unit 134 may be configured to generate a password on the condition that other authentications are successful. Other authentications include, for example, conventional password authentication, telephone number authentication, and other authentications other than voice authentication. In this case, the reception control unit 131 receives the fact that another authentication was successful. The generation unit 134 generates a password in response to success of other authentications. In this way, by combining with other authentications, the technology of the present disclosure can further increase security strength. Further, the authentication technology of the present disclosure can increase security strength by incorporating it into two-step authentication together with other authentications. In particular, in mobile terminals such as smartphones, when combined with fingerprint authentication or iris authentication, highly secure authentication can be performed without the user having to enter a character string.

また、情報処理装置１０の各機能を、他の装置に構成してもよい。例えば、記憶部１２０の各ＤＢは、外部のデータベースとして構築してもよい。また、情報処理装置１０の各機能を、他の装置に構成してもよい。例えば、記憶部１２０の各ＤＢは、外部のデータベースとして構築してもよい。 Further, each function of the information processing device 10 may be configured in another device. For example, each DB of the storage unit 120 may be constructed as an external database. Further, each function of the information processing device 10 may be configured in another device. For example, each DB of the storage unit 120 may be constructed as an external database.

＜付記＞
以上の各実施形態で説明した事項を、以下に付記する。 <Additional notes>
The matters described in each of the above embodiments are additionally described below.

（付記１）プロセッサ（１２）と、メモリ（１１）とを備えるコンピュータ（例えば、情報処理装置１０）に実行させるためのプログラムであって、前記プログラムは、前記プロセッサに、予め登録されたユーザの第１音声データから、第１の音声特徴量を抽出するステップ（Ｓ３０４）と、一時的なパスワードを生成するステップ（Ｓ４０２）と、前記ユーザに、前記パスワードを提示するステップ（Ｓ４０３）と、前記ユーザが前記パスワードを読み上げた第２音声データの入力を受け付けるステップ（Ｓ４０４）と、受け付けた前記第２音声データから、第２の音声特徴量を抽出するステップ（Ｓ４０５）と、前記第１の音声特徴量と、前記第２の音声特徴量と、前記パスワードとを用いて、ユーザ認証を行うステップ（Ｓ４０５）と、を実行させ、前記第１の音声特徴量と、前記第２の音声特徴量とは、ベクトルで表される音声特徴である、プログラム。 (Additional Note 1) A program to be executed by a computer (for example, information processing device 10) including a processor (12) and a memory (11), the program being executed by a user registered in advance in the processor. a step of extracting a first voice feature amount from the first voice data (S304), a step of generating a temporary password (S402), a step of presenting the password to the user (S403), a step of receiving input of second voice data in which the user reads out the password (S404); a step of extracting a second voice feature amount from the received second voice data (S405); A step (S405) of performing user authentication using the feature amount, the second audio feature amount, and the password is executed, and the first audio feature amount and the second audio feature amount are used. is a program that is an audio feature expressed as a vector.

（付記２）音声データを入力することに応じて、ユーザの音声特徴をベクトルで表す音声特徴量を出力するように予め学習された学習済みモデルを取得するステップ（Ｓ３０３）を更に実行し、前記第１の音声特徴量を抽出するステップにおいて、前記第１音声データと、前記学習済みモデルとを用いて、第１の音声特徴量を抽出し、前記第２の音声特徴量を抽出するステップにおいて、前記第２音声データと、前記学習済みモデルとを用いて、第２の音声特徴量を抽出する、（付記１）に記載のプログラム (Additional Note 2) In response to inputting voice data, further execute a step (S303) of acquiring a trained model that is trained in advance to output a voice feature amount representing a user's voice feature as a vector, and In the step of extracting a first voice feature, the first voice feature is extracted using the first voice data and the trained model, and in the step of extracting the second voice feature, , the program according to (Appendix 1), which extracts a second audio feature amount using the second audio data and the trained model.

（付記３）メル周波数ケプストラム係数を用いて、前記第１音声データを２次元ベクトルで表される第３の音声特徴量に変換するステップ（Ｓ３０２）と、メル周波数ケプストラム係数を用いて、前記第２音声データを２次元ベクトルで表される第４の音声特徴量に変換するステップ（Ｓ４６１）と、を実行させ、前記第１の音声特徴量を抽出するステップにおいて、前記第３の音声特徴量と、前記学習済みモデルとを用いて、第１の音声特徴量を抽出し、前記第２の音声特徴量を抽出するステップにおいて、前記第４の音声特徴量と、前記学習済みモデルとを用いて、第２の音声特徴量を抽出し、前記学習済みモデルは、前記２次元ベクトルを入力することにより、前記ベクトルで表す音声特徴量を出力するように予め学習される、（付記２）に記載のプログラム。 (Additional Note 3) A step (S302) of converting the first audio data into a third audio feature represented by a two-dimensional vector using Mel frequency cepstrum coefficients; 2 audio data into a fourth audio feature represented by a two-dimensional vector (S461), and in the step of extracting the first audio feature, the third audio feature and the trained model, in the step of extracting a first audio feature amount, and extracting the second audio feature amount, using the fourth audio feature amount and the trained model. (Appendix 2), wherein the trained model is trained in advance to output the audio feature represented by the vector by inputting the two-dimensional vector. Programs listed.

（付記４）前記第２音声データを、テキストデータに変換するステップ（Ｓ４６４）と、前記パスワードと、前記テキストデータとを用いて、パスワード認証を行うステップ（Ｓ４６５）と、前記第１の音声特徴量と前記第２の音声特徴量との距離に応じて、話者認証を行うステップ（Ｓ４６６）と、を実行させ、前記ユーザ認証を行うステップにおいて、前記パスワード認証の認証結果と、前記話者認証の認証結果とを用いて、ユーザ認証を行う、（付記１）～（付記３）の何れか１項に記載のプログラム。 (Additional Note 4) A step of converting the second audio data into text data (S464), a step of performing password authentication using the password and the text data (S465), and the first audio feature a step of performing speaker authentication (S466) according to the distance between the amount and the second voice feature amount, and in the step of performing user authentication, the authentication result of the password authentication and the speaker The program according to any one of (Appendix 1) to (Appendix 3), which performs user authentication using an authentication result of authentication.

前記ユーザ認証を行うステップにおいて、ユーザ認証に成功したことに応じて、前記サービスの利用を許可するステップ、を実行させ、前記第１音声データは、前記ユーザがサービスの利用を開始する前において取得したものであり、前記提示するステップにおいて、前記ユーザがサービスの利用する際に、前記ユーザに、前記パスワードを提示する、（付記１）に記載のプログラム。 In the step of performing user authentication, a step of permitting use of the service in response to successful user authentication is executed, and the first audio data is obtained before the user starts using the service. The program according to (Additional Note 1), wherein in the presenting step, the password is presented to the user when the user uses the service.

前記提示するステップにおいて、施設内の所定の設備に設置された情報処理装置により、前記ユーザに前記パスワードを提示し、前記第２音声データの入力を受け付けるステップにおいて、前記設備に設置された音声入力装置を通じて、前記第２音声データの入力を受け付け、前記許可するステップにおいて、前記ユーザ認証に成功したことに応じて、前記設備の解錠又は前記サービスを提供することに関する装置の起動を実行する、（付記５）に記載のプログラム。 In the presenting step, the password is presented to the user by an information processing device installed in a predetermined equipment in the facility, and in the step of accepting the input of the second audio data, an information processing device installed in the equipment is installed in the equipment. receiving the input of the second audio data through the device, and in the step of allowing, unlocking the equipment or activating the device related to providing the service in response to the successful user authentication; The program described in (Appendix 5).

（付記７）第１のサーバから、宿泊期間に関する情報を取得するステップと、を実行させ、前記生成するステップにおいて、前記宿泊期間に有効な前記パスワードを生成し、前記提示するステップにおいて、前記宿泊施設内の所定の設備に設置された情報処理装置により、前記ユーザに前記パスワードを提示し、前記第２音声データの入力を受け付けるステップにおいて、前記設備に設置された音声入力装置を通じて、前記第２音声データの入力を受け付け、前記許可するステップにおいて、前記ユーザ認証に成功したことに応じて、前記ユーザ認証の認証結果を第２のサーバに送信することにより、チェックインしたことを登録し、前記サービスは、宿泊施設の利用の提供であり、前記第１音声データは、前記ユーザが前記サービスの利用を開始する前において、前記第１のサーバから取得したものであり、前記第１のサーバは、前記宿泊施設若しくは旅行代理店のサーバ、又は、前記宿泊施設若しくは旅行代理店の提供するＷｅｂシステムであり、前記第２のサーバは、前記宿泊施設の利用を管理するサーバである、（付記５）に記載のプログラム。 (Additional Note 7) Obtaining information regarding the accommodation period from the first server, and in the generating step, generating the password valid for the accommodation period, and in the presenting step, In the step of presenting the password to the user and accepting the input of the second audio data by an information processing device installed in a predetermined equipment in the facility, the second audio data is transmitted through the audio input device installed in the equipment. In the step of accepting the input of voice data and allowing it, in response to the success of the user authentication, registering that the user has checked in by transmitting the authentication result of the user authentication to the second server; The service is provision of the use of accommodation facilities, the first audio data is obtained from the first server before the user starts using the service, and the first voice data is obtained from the first server. , a server of the accommodation facility or travel agency, or a web system provided by the accommodation facility or travel agency, and the second server is a server that manages the use of the accommodation facility (Appendix 5) ).

（付記８）前記チェックインしたことを登録したことに応じて、前記設備又は前記装置の解錠を行うための物理的なキー、電子キー、又はパスワードを発行するステップを実行させる（付記７）に記載のプログラム。 (Additional note 8) In response to the registration of the check-in, execute a step of issuing a physical key, electronic key, or password for unlocking the facility or the device (Additional note 7) The program described in.

（付記９）前記提示するステップにおいて、前記サービスを提供する者の電話受付担当者に前記パスワードを提示する、（付記５）に記載のプログラム。 (Additional Note 9) The program according to (Additional Note 5), wherein in the presenting step, the password is presented to a telephone receptionist of the person providing the service.

（付記１０）前記生成するステップにおいて、会話で用いるキーワードを前記パスワードとして生成すると共に、前記キーワードを答えさせる質問を生成し、前記提示するステップにおいて、前記サービスを提供する者の電話受付担当者に前記パスワードと、前記キーワードを答えさせる質問とを提示し、前記第２音声データの入力を受け付けるステップにおいて、前記キーワードを答えさせる質問をすることにより、前記第２音声データの入力を受け付ける、（付記９）に記載のプログラム。 (Additional note 10) In the step of generating, a keyword to be used in the conversation is generated as the password, and a question to be answered with the keyword is generated, and in the step of presenting the password, to the person in charge of telephone reception of the person providing the service. In the step of presenting the password and a question to which the keyword is to be answered, and receiving the input of the second voice data, receiving the input to the second voice data by asking the question to be answered by the keyword. 9) The program described in 9).

（付記１１）配送担当者の端末から一時的な文字列を受信するステップと、前記生成するステップにおいて、受信した前記文字列を前記パスワードとして生成し、前記許可するステップにおいて、前記ユーザ認証に成功したことに応じて、前記配送担当者により配送物を格納したロッカーを解錠する、（付記５）に記載のプログラム。 (Additional Note 11) In the step of receiving a temporary character string from the delivery person's terminal and the step of generating, the received character string is generated as the password, and in the step of authorizing, the user authentication is successful. The program according to (Appendix 5), wherein the delivery person unlocks the locker in which the delivery item is stored in response to the delivery.

（付記１２）他の端末から、認証要求を受け付けるステップ、を実行させ、前記生成するステップにおいて、前記認証要求を受け付けたことに応じて、前記パスワードを生成し、前記許可するステップにおいて、前記ユーザ認証に成功したことに応じて、前記ユーザが利用するロッカーの解錠を実行する、（付記５）に記載のプログラム。 (Additional Note 12) A step of receiving an authentication request from another terminal is executed, and in the step of generating, the password is generated in response to receiving the authentication request, and in the step of permitting, the user The program according to (Appendix 5), which unlocks a locker used by the user in response to successful authentication.

（付記１３）他の認証が成功したことを受け付けるステップ、を実行させ、前記生成するステップにおいて、前記他の認証が成功したことに応じて、前記パスワードを生成する、（付記１）に記載のプログラム。 (Additional Note 13) The step of accepting that another authentication has been successful is executed, and in the generating step, the password is generated in response to the success of the other authentication. program.

（付記１４）プロセッサ（１２）を備える情報処理装置（１０）であって、前記プロセッサが、予め登録されたユーザの第１音声データから、第１の音声特徴量を抽出するステップ（Ｓ３０４）と、一時的なパスワードを生成するステップ（Ｓ４０２）と、前記ユーザに、前記パスワードを提示するステップ（Ｓ４０３）と、前記ユーザが前記パスワードを読み上げた第２音声データの入力を受け付けるステップ（Ｓ４０４）と、受け付けた前記第２音声データから、第２の音声特徴量を抽出するステップ（Ｓ４０５）と、前記第１の音声特徴量と、前記第２の音声特徴量と、前記パスワードとを用いて、ユーザ認証を行うステップ（Ｓ４０５）と、を実行し、前記第１の音声特徴量と、前記第２の音声特徴量とは、ベクトルで表される音声特徴である、情報処理装置。 (Additional Note 14) An information processing device (10) comprising a processor (12), the processor extracting a first voice feature amount from first voice data of a user registered in advance (S304); , a step of generating a temporary password (S402), a step of presenting the password to the user (S403), and a step of receiving input of second audio data in which the user reads out the password (S404). , a step of extracting a second audio feature amount from the received second audio data (S405), using the first audio feature amount, the second audio feature amount, and the password, The information processing apparatus executes a step of user authentication (S405), and the first audio feature amount and the second audio feature amount are audio features represented by vectors.

（付記１５）プロセッサ（１２）を備えるコンピュータ（例えば、情報処理装置１０）が、予め登録されたユーザの第１音声データから、第１の音声特徴量を抽出するステップ（Ｓ３０４）と、一時的なパスワードを生成するステップ（Ｓ４０２）と、前記ユーザに、前記パスワードを提示するステップ（Ｓ４０３）と、前記ユーザが前記パスワードを読み上げた第２音声データの入力を受け付けるステップ（Ｓ４０４）と、受け付けた前記第２音声データから、第２の音声特徴量を抽出するステップ（Ｓ４０５）と、前記第１の音声特徴量と、前記第２の音声特徴量と、前記パスワードとを用いて、ユーザ認証を行うステップ（Ｓ４０５）と、を実行し、前記第１の音声特徴量と、前記第２の音声特徴量とは、ベクトルで表される音声特徴である、方法。 (Additional Note 15) A step (S304) in which a computer (for example, information processing device 10) including a processor (12) extracts a first voice feature amount from first voice data of a user registered in advance; a step of generating a password (S402), a step of presenting the password to the user (S403), a step of receiving input of second voice data in which the user reads out the password (S404); a step of extracting a second audio feature from the second audio data (S405); and performing user authentication using the first audio feature, the second audio feature, and the password. (S405), wherein the first audio feature amount and the second audio feature amount are audio features represented by vectors.

１情報処理システム、２情報処理システム、３情報処理システム、４情報処理システム、５情報処理システム、６情報処理システム、１０情報処理装置、１１記憶装置、１２プロセッサ、１３入出力インターフェース、１４通信インターフェース、２０ユーザ端末、２１記憶装置、２２プロセッサ、２３入出力インターフェース、２４通信インターフェース、３０ネットワーク、４０施設、４１情報処理装置、４２音声入力装置、５０宿泊施設、６０第１のサーバ、７０第２のサーバ、８０コールセンター、８１情報処理装置、９０端末、９１宅配ロッカー、９４ロッカー、１１０通信部、１２０記憶部、１３０制御部、１３１受信制御部、１３２送信制御部、１３３抽出部、１３４生成部、１３５提示部、１３６認証部。
1 information processing system, 2 information processing system, 3 information processing system, 4 information processing system, 5 information processing system, 6 information processing system, 10 information processing device, 11 storage device, 12 processor, 13 input/output interface, 14 communication interface , 20 user terminal, 21 storage device, 22 processor, 23 input/output interface, 24 communication interface, 30 network, 40 facility, 41 information processing device, 42 voice input device, 50 accommodation facility, 60 first server, 70 second server, 80 call center, 81 information processing device, 90 terminal, 91 delivery locker, 94 locker, 110 communication unit, 120 storage unit, 130 control unit, 131 reception control unit, 132 transmission control unit, 133 extraction unit, 134 generation unit , 135 presentation section, 136 authentication section.

Claims

A program for causing a computer including a processor and a memory to execute, the program causing the processor to:
extracting a first voice feature amount from first voice data of a user registered in advance;
generating a temporary password;
presenting the password to the user;
receiving input of second audio data in which the user reads out the password;
extracting a second audio feature from the received second audio data;
performing user authentication using the first audio feature amount, the second audio feature amount, and the password;
run the
The first audio feature amount and the second audio feature amount are audio features represented by vectors,
program.

further performing the step of obtaining a trained model that has been trained in advance to output a voice feature amount representing a user's voice feature as a vector in response to inputting voice data;
In the step of extracting the first voice feature amount, extracting a first voice feature amount using the first voice data and the trained model,
In the step of extracting the second voice feature amount, extracting a second voice feature amount using the second voice data and the learned model;
The program according to claim 1.

converting the first audio data into a third audio feature represented by a two-dimensional vector using Mel frequency cepstral coefficients;
converting the second audio data into a fourth audio feature represented by a two-dimensional vector using Mel frequency cepstral coefficients;
run the
In the step of extracting the first audio feature, extracting the first audio feature using the third audio feature and the trained model,
In the step of extracting the second voice feature, extracting a second voice feature using the fourth voice feature and the trained model,
The trained model is trained in advance to output the audio feature amount represented by the vector by inputting the two-dimensional vector.
The program according to claim 2.

converting the second audio data into text data;
performing password authentication using the password and the text data;
performing speaker authentication according to the distance between the first audio feature amount and the second audio feature amount;
run the
in the step of performing user authentication, performing user authentication using the authentication result of the password authentication and the authentication result of the speaker authentication;
The program according to any one of claims 1 to 3.

in the step of performing user authentication, allowing use of the service in response to successful user authentication;
run the
The first audio data is acquired before the user starts using the service,
In the presenting step, presenting the password to the user when the user uses the service;
The program according to claim 1.

In the presenting step, the password is presented to the user by an information processing device installed in a predetermined equipment in the facility;
In the step of receiving the input of the second audio data, receiving the input of the second audio data through an audio input device installed in the equipment,
In the step of allowing, in response to the successful user authentication, unlocking the equipment or activating a device related to providing the service;
The program according to claim 5.

obtaining information regarding the length of stay from the first server;
run the
In the step of generating, generating the password valid for the accommodation period,
In the presenting step, the password is presented to the user by an information processing device installed in a predetermined facility in the accommodation facility;
In the step of receiving the input of the second audio data, receiving the input of the second audio data through an audio input device installed in the equipment,
In the step of allowing, in response to the successful user authentication, registering check-in by transmitting the authentication result of the user authentication to a second server;
The service is the provision of accommodation facilities;
The first audio data is obtained from the first server before the user starts using the service,
The first server is a server of the accommodation facility or travel agency, or a web system provided by the accommodation facility or travel agency,
The second server is a server that manages the use of the accommodation facility,
The program according to claim 5.

8. The program according to claim 7, wherein the program executes the step of issuing a physical key, electronic key, or password for unlocking the facility or the device in response to the registration of the check-in.

In the presenting step, presenting the password to a telephone receptionist of the person providing the service;
The program according to claim 5.

In the step of generating, a keyword used in the conversation is generated as the password, and a question to be answered with the keyword is generated,
In the presenting step, presenting the password and a question for answering the keyword to a telephone receptionist of the person providing the service;
In the step of accepting input of the second voice data, receiving the input of the second voice data by asking a question to which the keyword is answered.
The program according to claim 9.

receiving a temporary string from a delivery person's terminal;
In the step of generating, generating the received character string as the password,
In the step of allowing, in response to the successful user authentication, the delivery person unlocks the locker in which the delivery item is stored;
The program according to claim 5.

a step of receiving an authentication request from another terminal;
run the
In the generating step, generating the password in response to receiving the authentication request,
In the step of allowing, in response to the successful user authentication, unlocking the locker used by the user;
The program according to claim 5.

accepting that the other authentication was successful;
run the
In the generating step, generating the password in response to the success of the other authentication;
The program according to claim 1.

An information processing device comprising a processor, the processor comprising:
extracting a first voice feature amount from first voice data of a user registered in advance;
generating a temporary password;
presenting the password to the user;
receiving input of second audio data in which the user reads out the password;
extracting a second audio feature from the received second audio data;
performing user authentication using the first audio feature amount, the second audio feature amount, and the password;
Run
The first audio feature amount and the second audio feature amount are audio features represented by vectors,
Information processing device.

A computer equipped with a processor,
extracting a first voice feature amount from first voice data of a user registered in advance;
generating a temporary password;
presenting the password to the user;
receiving input of second audio data in which the user reads out the password;
extracting a second audio feature from the received second audio data;
performing user authentication using the first audio feature amount, the second audio feature amount, and the password;
Run
The first audio feature amount and the second audio feature amount are audio features represented by vectors,
Method.