JP7493087B1

JP7493087B1 - Information processing device and information processing method

Info

Publication number: JP7493087B1
Application number: JP2023202916A
Authority: JP
Inventors: 清良披田野; 茂莉黒川; 求山口; 善則浅川; 仁志宮嵜
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2023-11-30
Filing date: 2023-11-30
Publication date: 2024-05-30
Anticipated expiration: 2043-11-30

Abstract

【課題】ユーザ情報を収集する過程で、匿名化処理が行われていないユーザ情報が流出しない情報処理装置及び情報処理方法を提供する。【解決手段】情報処理装置１は、第１データ群と第２データ群とを統合して統合データ群とした場合における、統合データ群に含まれる複数のデータに対して所定の確率でノイズが付与されているように第１データ群にノイズを付与する第１ノイズ付与クエリと、第２データ群にノイズを付与する第２ノイズ付与クエリと、第１データ群に含まれる複数のデータ識別情報を不可逆変換するクエリである第１不可逆変換クエリと、第２データ群に含まれる複数のデータ識別情報を不可逆変換するクエリである第２不可逆変換クエリとを生成する生成部１３２と、第１ノイズ付与クエリと第１不可逆変換クエリとを第１装置２に送信し、第２ノイズ付与クエリと第２不可逆変換クエリとを第２装置３に送信する送信部１３３と、を有する。【選択図】図２[Problem] To provide an information processing device and an information processing method that prevent user information that has not been anonymized during the process of collecting user information from leaking. [Solution] An information processing device (1) has a generating unit (132) that generates a first noise-adding query for adding noise to a first data group such that noise is added with a predetermined probability to a plurality of data included in an integrated data group when a first data group and a second data group are integrated to form an integrated data group, a second noise-adding query for adding noise to a second data group, a first irreversible conversion query that is a query for irreversibly converting a plurality of data identification information included in the first data group, and a second irreversible conversion query that is a query for irreversibly converting a plurality of data identification information included in the second data group, and a transmitting unit (133) that transmits the first noise-adding query and the first irreversible conversion query to a first device (2) and transmits the second noise-adding query and the second irreversible conversion query to a second device (3). [Selected Figure] FIG.

Description

本発明は、情報処理装置及び情報処理方法に関する。 The present invention relates to an information processing device and an information processing method.

従来、複数の事業者からユーザに関する情報であるユーザ情報を収集し、データ分析を行うことが実施されている。この場合、ユーザのプライバシーを保護するために、複数の事業者から収集したユーザ情報の少なくとも一部を匿名化することが行われている。例えば、特許文献１には、複数のユーザ情報を結合するための結合キーとなるデータに対して不可逆変換等を行い、変換後の結合キーを用いて複数の事業者それぞれに対応するユーザの個人情報を結合し、結合後のデータに対して追加的に匿名化処理を行うシステムが開示されている。 Conventionally, user information, which is information about users, is collected from multiple businesses and data analysis is performed. In this case, in order to protect the privacy of users, at least a portion of the user information collected from the multiple businesses is anonymized. For example, Patent Literature 1 discloses a system that performs irreversible conversion or the like on data that serves as a combining key for combining multiple pieces of user information, combines personal information of users corresponding to each of the multiple businesses using the converted combining key, and performs additional anonymization processing on the combined data.

特開２０２１－１１７６７９号公報JP 2021-117679 A

従来の技術では、複数の事業者それぞれからユーザ情報を収集してから匿名化処理を行うため、ユーザ情報を収集する過程で、匿名化処理が行われていないユーザ情報が流出してしまうおそれがある。 Conventional technology involves collecting user information from multiple businesses before anonymizing it, so there is a risk that unanonymized user information may be leaked during the process of collecting the user information.

そこで、本発明はこれらの点に鑑みてなされたものであり、ユーザ情報を収集する過程で、匿名化処理が行われていないユーザ情報が流出しないようにすることを目的とする。 The present invention has been made in consideration of these points, and aims to prevent the leakage of user information that has not been anonymized during the process of collecting user information.

本発明の第１の態様に係る情報処理装置は、データを識別するためのデータ識別情報と第１データとを関連付けた複数の第１レコードを含む第１データ群と、前記データ識別情報と第２データとを関連付けた複数の第２レコードを含む第２データ群とのうちの前記第１データ群に含まれる複数の前記第１データそれぞれに、前記第１データ群と前記第２データ群とを統合して統合データ群とした場合における、前記統合データ群に含まれる複数のデータに対して所定の確率でノイズが付与されているようにノイズを付与するクエリである第１ノイズ付与クエリと、前記第１データ群に含まれる複数の前記データ識別情報を所定の方法により不可逆変換するクエリである第１不可逆変換クエリとを生成する第１生成部と、前記第２データ群に含まれる複数の前記第２データそれぞれに、前記統合データ群に含まれる複数のデータに対して前記所定の確率でノイズが付与されているようにノイズを付与するクエリである第２ノイズ付与クエリと、前記第２データ群に含まれる複数の前記データ識別情報を前記所定の方法により不可逆変換するクエリである第２不可逆変換クエリとを生成する第２生成部と、前記第１生成部が生成した前記第１ノイズ付与クエリと前記第１不可逆変換クエリとを前記第１データ群の提供元に対応する第１装置に送信するとともに、前記第２生成部が生成した前記第２ノイズ付与クエリと前記第２不可逆変換クエリとを前記第２データ群の提供元に対応する第２装置に送信する送信部と、前記第１不可逆変換クエリに基づいて変換された前記データ識別情報と、前記第１ノイズ付与クエリに基づいてノイズが付与された複数の第１データとを関連付けた複数の第１レコードを含む変換後の第１データ群と、前記第２不可逆変換クエリに基づいて変換された前記データ識別情報と、前記第２ノイズ付与クエリに基づいてノイズが付与された複数の第２データとを関連付けた複数の第２レコードを含む変換後の第２データ群とを取得するデータ群取得部と、前記データ群取得部が取得した前記変換後の第１データ群に含まれる複数の第１レコードそれぞれの前記データ識別情報と、前記データ群取得部が取得した前記変換後の第２データ群に含まれる複数の第２レコードそれぞれの前記データ識別情報とに基づいて、前記変換後の第１データ群と前記変換後の第２データ群とを統合した前記統合データ群を生成する統合部と、を有する。 The information processing device according to the first aspect of the present invention includes a first generation unit that generates a first noise-adding query, which is a query for adding noise to each of the plurality of first data included in a first data group including a plurality of first records associating first data with data identification information for identifying data, and a second data group including a plurality of second records associating the data identification information with second data, in a case where the first data group and the second data group are integrated to form an integrated data group, a first irreversible conversion query, which is a query for irreversibly converting the plurality of data identification information included in the first data group by a predetermined method; a second generation unit that generates a second noise-adding query, which is a query for adding noise to each of the plurality of second data included in the second data group, in a case where the first data group and the second data group are integrated to form an integrated data group, a second irreversible conversion query, which is a query for irreversibly converting the plurality of data identification information included in the second data group by the predetermined method; The data acquisition unit includes a transmission unit that transmits the first noise-added query and the first irreversible conversion query to a first device corresponding to a provider of the first data group, and transmits the second noise-added query and the second irreversible conversion query generated by the second generation unit to a second device corresponding to a provider of the second data group; a converted first data group including a plurality of first records that associate the data identification information converted based on the first irreversible conversion query with a plurality of first data to which noise has been added based on the first noise-added query; and a converted second data group including a plurality of second records that associate the data identification information converted based on the second irreversible conversion query with a plurality of second data to which noise has been added based on the second noise-added query; and an integration unit that generates the integrated data group by integrating the converted first data group and the converted second data group based on the data identification information of each of the plurality of first records included in the converted first data group acquired by the data group acquisition unit and the data identification information of each of the plurality of second records included in the converted second data group acquired by the data group acquisition unit.

前記第１レコードは、ｎ_１個の属性それぞれに対応する複数の第１データを含み、前記第２レコードは、ｎ_２個の属性それぞれに対応する複数の第２データを含み、前記統合データ群に含まれる複数のデータは、プライバシーの強度を示すパラメータをεとするε－局所型差分プライバシーを満たしており、前記第１生成部は、前記ｎ_１個の属性それぞれの第１データにノイズが付与された場合に、ノイズが付与された前記ｎ_１個の属性それぞれの第１データがε_１－局所型差分プライバシー（ただし、プライバシーの強度を示すパラメータε_１はε_１＝ε／（ｎ_１＋ｎ_２）である）を満たすようにノイズを付与する前記第１ノイズ付与クエリを生成し、前記第２生成部は、前記ｎ_２個の属性それぞれの第２データにノイズが付与された場合に、ノイズが付与された前記ｎ_２個の属性それぞれの第２データがε_２－局所型差分プライバシー（ただし、プライバシーの強度を示すパラメータε_２はε_２＝ε／（ｎ_１＋ｎ_２）である）を満たすようにノイズを付与する前記第２ノイズ付与クエリを生成してもよい。 The first record may include a plurality of first data corresponding to each of n ₁ attributes, the second record may include a plurality of second data corresponding to each of n ₂ attributes, and the plurality of data included in the integrated data group may satisfy ε-local differential privacy where ε is a parameter indicating a strength of privacy, the first generation unit may generate the first noise-added query to add noise when noise is added to the first data of each of the n ₁ attributes, such that the first data of each of the n ₁ attributes to which noise has been added satisfies ε ₁ -local differential privacy (wherein the parameter ε ₁ indicating a strength of privacy is ε ₁ =ε/(n ₁ +n ₂ )), and the second generation unit may generate the second noise-added query to add noise when noise is added to the second data of each of the n ₂ attributes, such that the second data of each of the n ₂ attributes to which noise has been added satisfies ε ₂ -local differential privacy (wherein the parameter ε ₂ indicating a strength of privacy is ε ₂ =ε/(n ₁ +n ₂ )).

前記データ識別情報に関連付けられているデータ群はｋ個（ただし、ｋは３以上の整数）存在し、第ｋデータ群に含まれる複数の第ｋデータそれぞれに、ｋ個のデータ群を統合して前記統合データ群とした場合における、前記統合データ群に含まれる複数のデータに対して所定の確率でノイズが付与されているようにノイズを付与するクエリである第ｋノイズ付与クエリと、前記第ｋデータ群に含まれる複数の前記データ識別情報を所定の方法により不可逆変換するクエリである第ｋ不可逆変換クエリとを生成する第ｋ生成部をさらに有し、前記第ｋデータ群に含まれる第ｋレコードは、ｎ_ｋ個の属性それぞれに対応する複数の第ｋデータを含み、前記第ｋ生成部は、前記ｎ_ｋ個の属性それぞれの第ｋデータにノイズが付与された場合に、ノイズが付与された前記ｎ_ｋ個の属性それぞれの第ｋデータがε_ｋ－局所型差分プライバシー（ただし、プライバシーの強度を示すパラメータε_ｋはε_ｋ＝ε／（ｎ_１＋ｎ_２＋・・・＋ｎ_ｋ）である）を満たすようにノイズを付与する前記第ｋノイズ付与クエリを生成してもよい。 The data identification information may include k data groups (where k is an integer equal to or greater than 3) associated with the data identification information, and the kth generation unit may generate a kth noise-adding query, which is a query for adding noise to each of a plurality of kth data included in the kth data group such that noise is added to a plurality of data included in the integrated data group with a predetermined probability when the k data groups are integrated into the integrated data group, and a kth irreversible conversion query, which is a query for irreversibly converting the plurality of data identification information included in the kth data group by a predetermined method, wherein the kth record included in the kth data group includes a plurality of kth data corresponding to each of n _k attributes, and the _kth generation unit may generate the kth noise-adding query for adding noise such that, when noise is added to the kth data of each of the n _k attributes, the kth data of each of the n k attributes to which noise has been added satisfies ε _k -local differential privacy (wherein a parameter ε _k indicating the strength of privacy is ε _k = ε/(n ₁ + n ₂ + ... + n _k )).

前記情報処理装置は、前記第１装置から前記第１レコードを構成する複数の属性それぞれに対応する項目を示す第１項目情報を取得するとともに、前記第２装置から前記第２レコードを構成する複数の属性それぞれに対応する項目を示す第２項目情報を取得し、取得した前記第１項目情報に基づいて前記第１データに対応する属性の個数である前記ｎ_１を特定し、取得した前記第２項目情報に基づいて前記第２データに対応する属性の個数である前記ｎ_２を特定し、特定した属性の個数である前記ｎ_１及び前記ｎ_２に基づいて、ノイズが付与された後の前記第１データ及び前記第２データが満たす局所型差分プライバシーにおけるプライバシーの強度を示す第１パラメータε_１及び第２パラメータε_２を決定する決定部を有してもよい。 The information processing device may include a determination unit that acquires first item information indicating items corresponding to each of a plurality of attributes constituting the first record from the first device, and acquires second item information indicating items corresponding to each of a plurality of attributes constituting the second record from the second device, identifies _n1 being the number of attributes corresponding to the first data based on the acquired first item information, identifies _n2 being the number of attributes corresponding to the second data based on the acquired second item information, and determines a first parameter _ε1 and a second parameter _ε2 indicating a strength of privacy in local differential privacy satisfied by the first data and the second data after noise has been added, based on the identified numbers of attributes _n1 and _n2 .

前記第１生成部及び前記第２生成部の少なくともいずれかは、データ群に含まれる複数の属性それぞれに対応する複数のデータのうち、少なくとも一つの属性に対応するデータが取り得る値の数を減少させ、当該データが取り得る値の数を減少させた後に、複数のデータそれぞれに前記ノイズを付与する前記ノイズ付与クエリを生成してもよい。 At least one of the first generation unit and the second generation unit may reduce the number of possible values of data corresponding to at least one attribute among a plurality of data corresponding to each of a plurality of attributes included in a data group, and after reducing the number of possible values of the data, generate the noise-added query that adds the noise to each of the plurality of data.

前記第１生成部は、前記変換後の第１データ群に含まれる前記第１レコードに含まれる第１データを、第１の割合で他の第１レコードに含まれる前記第１データと入れ替えることにより前記変換後の第１データ群を更新するクエリである第１更新クエリを生成し、前記第２生成部は、前記変換後の第２データ群に含まれる前記第２レコードに含まれる第２データを、第２の割合で他の第２レコードに含まれる前記第２データと入れ替えることにより前記変換後の第２データ群を更新するクエリである第２更新クエリを生成し、前記送信部は、前記第１生成部が生成した前記第１更新クエリを前記第１装置に送信し、前記第２生成部が生成した前記第２更新クエリを前記第２装置に送信してもよい。 The first generation unit may generate a first update query, which is a query that updates the converted first data group by replacing the first data included in the first record included in the converted first data group with the first data included in another first record at a first ratio, and the second generation unit may generate a second update query, which is a query that updates the converted second data group by replacing the second data included in the second record included in the converted second data group with the second data included in another second record at a second ratio, and the transmission unit may transmit the first update query generated by the first generation unit to the first device and transmit the second update query generated by the second generation unit to the second device.

前記情報処理装置は、前記統合データ群に含まれるレコードに含まれるデータを、第３の割合で他のレコードに含まれるデータと入れ替えることにより前記統合データ群を更新する更新部を有してもよい。 The information processing device may have an update unit that updates the integrated data set by replacing data included in records included in the integrated data set with data included in other records at a third ratio.

前記第１生成部は、前記第１データ群に含まれる複数の前記データ識別情報それぞれに、ランダムデータを付加してから前記所定の方法により不可逆変換する前記第１不可逆変換クエリを生成し、前記第２生成部は、前記第２データ群に含まれる複数の前記データ識別情報それぞれに、当該データ識別情報に対応する、前記第１データ群に含まれるデータ識別情報に付加されたランダムデータと同一のランダムデータを付加してから前記所定の方法により不可逆変換する前記第２不可逆変換クエリを生成してもよい。 The first generation unit may generate the first irreversible conversion query by adding random data to each of the plurality of pieces of data identification information included in the first data group and then performing irreversible conversion using the predetermined method, and the second generation unit may generate the second irreversible conversion query by adding random data that is the same as the random data added to the data identification information included in the first data group and corresponds to the data identification information, to each of the plurality of pieces of data identification information included in the second data group and then performing irreversible conversion using the predetermined method.

前記統合部は、前記統合データ群をさらに加工して統計データを生成し、当該統計データに対して前記ノイズの付与に用いられる前記所定の確率を用いて前記ノイズを除去するよう補正を行ってもよい。 The integration unit may further process the integrated data group to generate statistical data, and may correct the statistical data to remove the noise using the predetermined probability used to add the noise.

前記第１データ群及び前記第２データ群には、新たに追加されたレコードを特定するために用いることができる特定用データが含まれており、前記第１生成部は、前記第１不可逆変換クエリを再生成する場合、前記特定用データに基づいて新たに追加された第１レコードに対して前記ノイズを付与する前記第１ノイズ付与クエリを生成し、前記第２生成部は、前記第２不可逆変換クエリを再生成する場合、前記特定用データに基づいて新たに追加された第２レコードに対して前記ノイズを付与する前記第２ノイズ付与クエリを生成してもよい。 The first data group and the second data group may include identification data that can be used to identify a newly added record, and when regenerating the first irreversible conversion query, the first generation unit may generate the first noise-adding query that adds the noise to the newly added first record based on the identification data, and when regenerating the second irreversible conversion query, the second generation unit may generate the second noise-adding query that adds the noise to the newly added second record based on the identification data.

前記統合部は、前記変換後の第１データ群に含まれる複数の第１レコードそれぞれの前記データ識別情報と、前記変換後の第２データ群に含まれる複数の第２レコードそれぞれの前記データ識別情報とに基づいて、前記変換後の第１データ群と前記変換後の第２データ群とを統合し、前記データ識別情報を除外して、前記統合データ群を生成してもよい。 The integration unit may integrate the converted first data group and the converted second data group based on the data identification information of each of a plurality of first records included in the converted first data group and the data identification information of each of a plurality of second records included in the converted second data group, and may generate the integrated data group by excluding the data identification information.

本発明の第２の態様に係る情報処理方法は、情報処理装置が実行する、データを識別するためのデータ識別情報と第１データとを関連付けた複数の第１レコードを含む第１データ群と、前記データ識別情報と第２データとを関連付けた複数の第２レコードを含む第２データ群とのうちの前記第１データ群に含まれる複数の前記第１データそれぞれに、前記第１データ群と前記第２データ群とを統合して統合データ群とした場合における、前記統合データ群に含まれる複数のデータに対して所定の確率でノイズが付与されているようにノイズを付与するクエリである第１ノイズ付与クエリと、前記第１データ群に含まれる複数の前記データ識別情報を所定の方法により不可逆変換するクエリである第１不可逆変換クエリとを生成するステップと、前記第２データ群に含まれる複数の前記第２データそれぞれに、前記統合データ群に含まれる複数のデータに対して前記所定の確率でノイズが付与されているようにノイズを付与するクエリである第２ノイズ付与クエリと、前記第２データ群に含まれる複数の前記データ識別情報を前記所定の方法により不可逆変換するクエリである第２不可逆変換クエリとを生成するステップと、生成した前記第１ノイズ付与クエリと前記第１不可逆変換クエリとを前記第１データ群の提供元に対応する第１装置に送信するとともに、生成した前記第２ノイズ付与クエリと前記第２不可逆変換クエリとを前記第２データ群の提供元に対応する第２装置に送信するステップと、前記第１不可逆変換クエリに基づいて変換された前記データ識別情報と、前記第１ノイズ付与クエリに基づいてノイズが付与された複数の第１データとを関連付けた複数の第１レコードを含む変換後の第１データ群と、前記第２不可逆変換クエリに基づいて変換された前記データ識別情報と、前記第２ノイズ付与クエリに基づいてノイズが付与された複数の第２データとを関連付けた複数の第２レコードを含む変換後の第２データ群とを取得するステップと、取得した前記変換後の第１データ群に含まれる複数の第１レコードそれぞれの前記データ識別情報と、取得した前記変換後の第２データ群に含まれる複数の第２レコードそれぞれの前記データ識別情報とに基づいて、前記変換後の第１データ群と前記変換後の第２データ群とを統合した前記統合データ群を生成するステップと、を有する。 An information processing method according to a second aspect of the present invention includes a step of generating, executed by an information processing device, a first noise-adding query, which is a query for adding noise to each of the plurality of first data included in a first data group including a plurality of first records associating data identification information for identifying data with first data, and a second data group including a plurality of second records associating the data identification information with second data, in a case where the first data group and the second data group are integrated to form an integrated data group, so that noise is added to the plurality of data included in the integrated data group with a predetermined probability, and a first irreversible conversion query, which is a query for irreversibly converting the plurality of data identification information included in the first data group by a predetermined method; a second noise-adding query, which is a query for adding noise to each of the plurality of second data included in the second data group such that noise is added to the plurality of data included in the integrated data group with the predetermined probability; and a second irreversible conversion query, which is a query for irreversibly converting the plurality of data identification information included in the second data group by the predetermined method. a step of generating a first noise-added query and a first irreversible conversion query; a step of transmitting the generated first noise-added query and the generated first irreversible conversion query to a first device corresponding to a provider of the first data group, and a step of transmitting the generated second noise-added query and the generated second irreversible conversion query to a second device corresponding to a provider of the second data group; a step of acquiring a converted first data group including a plurality of first records associating the data identification information converted based on the first irreversible conversion query with a plurality of first data to which noise has been added based on the first noise-added query, and a converted second data group including a plurality of second records associating the data identification information converted based on the second irreversible conversion query with a plurality of second data to which noise has been added based on the second noise-added query; and a step of generating the integrated data group by integrating the converted first data group and the converted second data group based on the data identification information of each of the plurality of first records included in the acquired converted first data group and the data identification information of each of the plurality of second records included in the acquired converted second data group.

本発明によれば、ユーザ情報を収集する過程で、匿名化処理が行われていないユーザ情報が流出しないようにすることができるという効果を奏する。 The present invention has the effect of preventing the leakage of user information that has not been anonymized during the process of collecting user information.

情報処理システムの概要を説明する図である。FIG. 1 is a diagram illustrating an overview of an information processing system. 情報処理装置の機能構成を示す図である。FIG. 2 is a diagram illustrating a functional configuration of an information processing device. 第１データ群と第２データ群との一例を示す図である。FIG. 4 is a diagram showing an example of a first data group and a second data group. 変換後の第１データ群と変換後の第２データ群との一例を示す図である。11A and 11B are diagrams illustrating an example of a first data group after conversion and a second data group after conversion. 統合データ群の一例を示す図である。FIG. 13 is a diagram illustrating an example of an integrated data group. 情報処理装置が統合データ群を生成するまでの処理の流れを示すシーケンス図である。11 is a sequence diagram showing a processing flow up to when the information processing device generates an integrated data group. FIG.

［情報処理システムＳの概要］
図１は、情報処理システムＳの概要を説明する図である。情報処理システムＳは、情報処理装置１と、第１データ群を管理する第１装置２と、第２データ群を管理する第２装置３とを有し、第１データ群及び第２データ群に含まれるユーザ情報の匿名化を行ったうえで第１データ群と第２データ群とを統合した統合データ群を生成するシステムである。 [Overview of Information Processing System S]
1 is a diagram illustrating an overview of an information processing system S. The information processing system S includes an information processing device 1, a first device 2 that manages a first data group, and a second device 3 that manages a second data group, and is a system that generates an integrated data group by integrating the first data group and the second data group after anonymizing user information included in the first data group and the second data group.

情報処理装置１は、例えばデータを集約し、集約後のデータを提供するサービスを提供する集約事業者により運用されており、第１装置２及び第２装置３等の外部装置と、インターネットや携帯電話回線等の通信ネットワーク（不図示）を介して通信可能に接続されている。 The information processing device 1 is operated by an aggregator that provides a service of aggregating data and providing the aggregated data, and is communicatively connected to external devices such as a first device 2 and a second device 3 via a communication network (not shown) such as the Internet or a mobile phone line.

第１装置２は、例えば第１の事業者により運用されており、データを識別するためのデータ識別情報としてのデータＩＤと第１データとを関連付けた複数の第１レコードを含む第１データ群を管理している。第２装置３は、例えば第２の事業者により運用されており、第１データ群に含まれているデータＩＤと共通のデータＩＤと、第２データとを関連付けた複数の第２レコードを含む第２データ群を管理している。 The first device 2 is operated, for example, by a first business operator, and manages a first data group including a plurality of first records that associate the first data with a data ID as data identification information for identifying the data. The second device 3 is operated, for example, by a second business operator, and manages a second data group including a plurality of second records that associate the second data with a data ID common to the data IDs included in the first data group.

情報処理装置１は、第１データ群に含まれる複数の第１データそれぞれにノイズを付与する第１ノイズ付与クエリと、第１データ群に含まれる複数のデータＩＤを所定の方法により不可逆変換するクエリである第１不可逆変換クエリとを生成する。第１ノイズ付与クエリは、例えば、第１データ群に含まれる複数の第１データそれぞれに、第１データ群と第２データ群とを統合して統合データ群とした場合における、統合データ群に含まれる複数のデータに対して所定の確率でノイズが付与されているようにノイズを付与するクエリである。クエリは、例えば、リレーショナルデータベース管理システムにおいて実行可能なＳＱＬ（Structured Query Language）文であるものとする。 The information processing device 1 generates a first noise-adding query that adds noise to each of a plurality of first data included in the first data group, and a first irreversible conversion query that is a query that irreversibly converts a plurality of data IDs included in the first data group by a predetermined method. The first noise-adding query is, for example, a query that adds noise to each of a plurality of first data included in the first data group such that noise is added with a predetermined probability to a plurality of data included in an integrated data group when the first data group and the second data group are integrated into an integrated data group. The query is, for example, a Structured Query Language (SQL) statement that can be executed in a relational database management system.

情報処理装置１は、第２データ群に含まれる複数の第２データそれぞれにノイズを付与する第２ノイズ付与クエリと、第２データ群に含まれる複数のデータＩＤを所定の方法により不可逆変換するクエリである第２不可逆変換クエリとを生成する。第２ノイズ付与クエリは、第１ノイズ付与クエリと同様に、第２データ群に含まれる複数の第２データそれぞれに、第１データ群と第２データ群とを統合して統合データ群とした場合における、統合データ群に含まれる複数のデータに対して所定の確率でノイズが付与されているようにノイズを付与するクエリである。 The information processing device 1 generates a second noise-adding query that adds noise to each of the multiple second data included in the second data group, and a second irreversible conversion query that is a query that irreversibly converts multiple data IDs included in the second data group by a predetermined method. The second noise-adding query, like the first noise-adding query, is a query that adds noise to each of the multiple second data included in the second data group in such a way that when the first data group and the second data group are integrated into an integrated data group, noise is added with a predetermined probability to the multiple data included in the integrated data group.

情報処理装置１は、生成した第１ノイズ付与クエリと、第１不可逆変換クエリとを第１装置２に送信するとともに、生成した第２ノイズ付与クエリと、第２不可逆変換クエリとを第２装置３に送信する。 The information processing device 1 transmits the generated first noise-added query and the first irreversible conversion query to the first device 2, and transmits the generated second noise-added query and the second irreversible conversion query to the second device 3.

第１装置２は、情報処理装置１から受信した第１ノイズ付与クエリと、第１不可逆変換クエリとを実行し、第１不可逆変換クエリに基づいて変換されたデータＩＤと、第１ノイズ付与クエリに基づいてノイズが付与された複数の第１データとを関連付けた複数の第１レコードを含む変換後の第１データ群を生成する。第１装置２は、生成した変換後の第１データ群を情報処理装置１に送信する。 The first device 2 executes the first noise-added query and the first irreversible conversion query received from the information processing device 1, and generates a converted first data group including a plurality of first records that associate a data ID converted based on the first irreversible conversion query with a plurality of first data to which noise has been added based on the first noise-added query. The first device 2 transmits the generated converted first data group to the information processing device 1.

第２装置３は、情報処理装置１から受信した第２ノイズ付与クエリと、第２不可逆変換クエリとを実行し、第２不可逆変換クエリに基づいて変換されたデータＩＤと、第２ノイズ付与クエリに基づいてノイズが付与された複数の第２データとを関連付けた複数の第２レコードを含む変換後の第２データ群を生成する。第２装置３は、生成した変換後の第２データ群を情報処理装置１に送信する。 The second device 3 executes the second noise-added query and the second irreversible conversion query received from the information processing device 1, and generates a converted second data group including a plurality of second records that associate a data ID converted based on the second irreversible conversion query with a plurality of second data to which noise has been added based on the second noise-added query. The second device 3 transmits the generated converted second data group to the information processing device 1.

このように、第１装置２及び第２装置３それぞれにおいて、データ群の匿名化処理を行ったうえでデータ群を情報処理装置１に送信することができるので、ユーザ情報を収集する過程で、匿名化処理が行われていないユーザ情報が流出しないようにすることができる。 In this way, the first device 2 and the second device 3 can each perform anonymization processing on the data group before transmitting the data group to the information processing device 1, so that user information that has not been anonymized can be prevented from leaking during the process of collecting user information.

情報処理装置１は、第１装置２から受信した変換後の第１データ群に含まれる複数の第１レコードそれぞれのデータＩＤと、第２装置３から受信した変換後の第２データ群に含まれる複数の第２レコードそれぞれのデータＩＤとに基づいて、変換後の第１データ群と変換後の第２データ群とを統合した統合データ群を生成する。 The information processing device 1 generates an integrated data group by integrating the converted first data group and the converted second data group based on the data IDs of each of the multiple first records included in the converted first data group received from the first device 2 and the data IDs of each of the multiple second records included in the converted second data group received from the second device 3.

このようにして統合された統合データ群に含まれる複数のデータには、所定の確率でノイズが付与されることとなる。また、データＩＤは、不可逆変換クエリにより変換されることから、変換後のデータＩＤに基づいて個人を特定するのが困難となる。これにより、情報処理装置１は、統合データに含まれるユーザ情報のプライバシーを確保することができる。 The multiple data included in the integrated data group integrated in this way will be given noise with a certain probability. In addition, since the data ID is converted by an irreversible conversion query, it becomes difficult to identify an individual based on the converted data ID. This allows the information processing device 1 to ensure the privacy of user information included in the integrated data.

［情報処理装置１の機能構成］
続いて、情報処理装置１の機能構成について説明する。図２は、情報処理装置１の機能構成を示す図である。 [Functional configuration of information processing device 1]
Next, a description will be given of the functional configuration of the information processing device 1. FIG.

図２に示すように、情報処理装置１は、通信部１１と、記憶部１２と、制御部１３とを有する。
通信部１１は、第１装置２及び第２装置３等と通信ネットワークを介してデータを送受信するための通信インターフェースである。 As shown in FIG. 2 , the information processing device 1 includes a communication unit 11 , a storage unit 12 , and a control unit 13 .
The communication unit 11 is a communication interface for transmitting and receiving data to and from the first device 2, the second device 3, and the like via a communication network.

記憶部１２は、各種のデータを記憶する記憶媒体であり、ＲＯＭ（Read Only Memory）、ＲＡＭ（Random Access Memory）、ハードディスク、ＳＳＤ（Solid State Drive）、及びフラッシュメモリ等を有する。記憶部１２は、制御部１３が実行するプログラムを記憶する。記憶部１２は、制御部１３を、決定部１３１、生成部１３２、送信部１３３、データ群取得部１３４及び統合部１３５として機能させるプログラムを記憶する。 The storage unit 12 is a storage medium that stores various types of data, and includes a ROM (Read Only Memory), a RAM (Random Access Memory), a hard disk, an SSD (Solid State Drive), and a flash memory. The storage unit 12 stores programs executed by the control unit 13. The storage unit 12 stores programs that cause the control unit 13 to function as a determination unit 131, a generation unit 132, a transmission unit 133, a data group acquisition unit 134, and an integration unit 135.

制御部１３は、例えばＣＰＵ（Central Processing Unit）である。制御部１３は、記憶部１２に記憶されたプログラムを実行することにより、決定部１３１、生成部１３２、送信部１３３、データ群取得部１３４及び統合部１３５として機能する。 The control unit 13 is, for example, a CPU (Central Processing Unit). The control unit 13 executes the programs stored in the storage unit 12, thereby functioning as a determination unit 131, a generation unit 132, a transmission unit 133, a data group acquisition unit 134, and an integration unit 135.

以下、制御部１３が有する機能について説明するにあたり、第１データ群と、第２データ群とについて説明する。図３は、第１データ群と第２データ群との一例を示す図である。図３において、（Ａ）は第１データ群を示しており、（Ｂ）は第２データ群を示している。 Below, the first data group and the second data group will be described in order to explain the functions of the control unit 13. FIG. 3 is a diagram showing an example of the first data group and the second data group. In FIG. 3, (A) shows the first data group, and (B) shows the second data group.

第１データ群は、第１の事業者が管理するデータ群であり、第１装置２に設けられたデータベース、又は第１装置２がアクセス可能なサーバに設けられたデータベースに格納されている。図３に示すように、第１データ群は、データを識別するためのデータ識別情報としてのデータＩＤと、ｎ_１個の属性それぞれに対応する複数の第１データとを関連付けた複数の第１レコードを含んでいる。データＩＤは、例えば、第１の事業者と第２事業者とがユーザに対して付与している共通のユーザＩＤである。 The first data group is a data group managed by the first business operator, and is stored in a database provided in the first device 2 or a database provided in a server accessible to the first device 2. As shown in Fig. 3, the first data group includes a data ID as data identification information for identifying data, and a plurality of first records in which a plurality of first data corresponding to each of the _n1 attributes are associated with each other. The data ID is, for example, a common user ID given to a user by the first business operator and the second business operator.

図３に示す例では、第１データ群は、第１の事業者が運営する店舗における売上とユーザの年齢とを関連付けたデータ群であり、複数の属性それぞれに対応する「年齢」、「商品カテゴリ食料品」、「商品カテゴリ日用品」、「購入ランキング」という項目の第１データが含まれている。第１データ群は、１つのテーブル、又は複数のテーブルを連結することにより生成されたテーブルを示すものとするが、これに限らず、一以上のテーブルを参照するビューであってもよい。 In the example shown in FIG. 3, the first data group is a data group that associates sales at a store operated by a first business operator with the age of the user, and includes first data for the items "age," "product category groceries," "product category daily necessities," and "purchase ranking" that correspond to each of a plurality of attributes. The first data group represents one table or a table generated by concatenating a plurality of tables, but is not limited to this and may be a view that references one or more tables.

第２データ群は、第２の事業者が管理するデータ群であり、第２装置３に設けられたデータベース、又は第２装置３がアクセス可能なサーバに設けられたデータベースに格納されている。図３に示すように、第２データ群は、データを識別するためのデータ識別情報としてのデータＩＤと、ｎ_２個の属性それぞれに対応する複数の第２データとを関連付けた複数の第２レコードを含んでいる。図３に示す例では、第２データ群は、ユーザの年齢と、施設への訪問履歴とを関連付けたデータ群であり、複数の属性それぞれに対応する「性別」、「訪問場所スーパー」、「訪問場所公園」という項目の第２データが含まれている。第２データ群は、１つのテーブル、又は複数のテーブルを連結することにより生成されたテーブルを示すものとするが、これに限らず、一以上のテーブルを参照するビューであってもよい。 The second data group is a data group managed by the second business operator, and is stored in a database provided in the second device 3 or a database provided in a server accessible to the second device 3. As shown in FIG. 3, the second data group includes a plurality of second records in which a data ID as data identification information for identifying data is associated with a plurality of second data corresponding to each of n ₂ attributes. In the example shown in FIG. 3, the second data group is a data group in which the user's age is associated with a visit history to a facility, and includes second data of the items "gender", "visited location supermarket", and "visited location park" corresponding to each of the multiple attributes. The second data group indicates one table or a table generated by linking multiple tables, but is not limited to this, and may be a view that references one or more tables.

第１データ群と、第２データ群とには、同一のユーザの情報が含まれており、第１データ群と第２データ群とにおいて、同一のユーザのデータＩＤは共通であるものとする。これにより、データＩＤをキーとして第１データ群に含まれる第１レコードと第２データ群に含まれる第２レコードとを連結することができる。 The first data group and the second data group contain information about the same user, and the data ID of the same user is common in the first data group and the second data group. This makes it possible to link a first record included in the first data group and a second record included in the second data group using the data ID as a key.

続いて、制御部１３が有する機能について説明する。
決定部１３１は、第１データ群と第２データ群とを統合した統合データ群に含まれる複数のデータに対して所定の確率でノイズが付与されるように、第１データ群にノイズが付与される確率と、第２データ群にノイズが付与される確率とを決定する。 Next, the functions of the control unit 13 will be described.
The determination unit 131 determines the probability that noise will be added to the first data group and the probability that noise will be added to the second data group, so that noise is added with a predetermined probability to multiple data included in the integrated data group obtained by integrating the first data group and the second data group.

決定部１３１は、ノイズが付与された後の第１データ及び第２データが満たす局所型差分プライバシーにおけるプライバシーの強度を示す第１パラメータε_１及び第２パラメータε_２を決定する。 The determination unit 131 determines a first parameter ε ₁ and a second parameter ε ₂ that indicate the strength of privacy in local differential privacy satisfied by the first data and the second data after noise is added.

決定部１３１が第１パラメータε_１及び第２パラメータε_２を決定するにあたり、局所型差分プライバシーについて説明する。まず、あるデータ群における任意のデータペアをｘ１、ｘ２とする。そして、データｘに対し、ランダムでノイズを付与する関数をＲ（ｘ）とし、その出力をｙとした場合に、以下の式（１）が成立するとき、関数Ｒは局所型差分プライバシーを満たすと定義される。 When the determination unit 131 determines the first parameter ε ₁ and the second parameter ε ₂ , local differential privacy will be described. First, an arbitrary data pair in a certain data group is assumed to be x1 and x2. Then, when a function that randomly adds noise to data x is assumed to be R(x) and its output is assumed to be y, when the following formula (1) is established, the function R is defined as satisfying local differential privacy.

ここで、Ｐｒ［］は、確率変数である。また、ｅは自然対数であり、εはプライバシーの強度を示すパラメータである。また、εはプライバシーの強度がεである局所型差分プライバシーをε－局所型差分プライバシーという。 Here, Pr[ ] is a random variable. Also, e is the natural logarithm, and ε is a parameter indicating the strength of privacy. Also, local differential privacy with privacy strength of ε is called ε-local differential privacy.

ε－局所型差分プライバシーが満たされるようなデータの加工例としては、以下に示す加工例が挙げられる。例えば、データｘがｋ個の値を取り得るものとした場合、以下の式（２）に基づいて、データｘの入力に対して、データｙが出力される。 The following is an example of data processing that satisfies ε-local differential privacy. For example, if data x can take k values, data y is output for input of data x based on the following formula (2).

決定部１３１は、第１装置２から第１レコードを構成する複数の属性それぞれに対応する項目を示す第１項目情報を取得するとともに、第２装置３から第２レコードを構成する複数の属性それぞれに対応する項目を示す第２項目情報を取得する。項目情報は、第１レコードに含まれる複数の項目のうち、統合データに含める項目を示す情報である。図３に示す例では、決定部１３１は、「年齢」、「商品カテゴリ食料品」、「商品カテゴリ日用品」、「購入ランキング」という４つの項目を示す第１項目情報を取得する。また、決定部１３１は、「性別」、「訪問場所スーパー」、「訪問場所公園」という３つの項目を示す第１項目情報を取得する。 The determination unit 131 acquires from the first device 2 first item information indicating items corresponding to each of the multiple attributes constituting the first record, and acquires from the second device 3 second item information indicating items corresponding to each of the multiple attributes constituting the second record. The item information is information indicating items to be included in the integrated data from the multiple items included in the first record. In the example shown in FIG. 3, the determination unit 131 acquires first item information indicating four items, namely "age," "product category groceries," "product category daily necessities," and "purchase ranking." The determination unit 131 also acquires first item information indicating three items, namely "gender," "visited location supermarket," and "visited location park."

決定部１３１は、取得した第１項目情報に基づいて第１データに対応する属性の個数であるｎ_１を特定し、取得した第２項目情報に基づいて第２データに対応する属性の個数であるｎ_２を特定する。第１データに対応する属性の個数であるｎ_１と、第２データに対応する属性の個数であるｎ_２との和は、統合データ群に含まれる属性の個数である。決定部１３１は、特定した属性の個数であるｎ_１及びｎ_２に基づいて、ノイズが付与された後の複数の属性それぞれの第１データ及び第２データが満たす局所型差分プライバシーにおけるプライバシーの強度を示す第１パラメータε_１及び第２パラメータε_２を決定する。例えば、決定部１３１は、以下の式（３）に示すように、複数の属性それぞれのデータが、（ε／ｎ_１＋ｎ_２）－局所型差分プライバシーが適用されるように、第１パラメータε_１及び第２パラメータε_２を決定する。 The determination unit 131 specifies n ₁ , which is the number of attributes corresponding to the first data, based on the acquired first item information, and specifies n ₂ , which is the number of attributes corresponding to the second data, based on the acquired second item information. The sum of n ₁ , which is the number of attributes corresponding to the first data, and n ₂ , which is the number of attributes corresponding to the second data, is the number of attributes included in the integrated data group. The determination unit 131 determines a first parameter ε ₁ and a second parameter ε ₂ indicating the strength of privacy in local differential privacy satisfied by the first data and the second data of each of the multiple attributes after noise is added, based on the identified numbers of attributes n 1 and n _2. For example, the determination unit 131 determines the first parameter ε ₁ and the second parameter ε ₂ so that (ε/n ₁ +n ₂ )-local differential privacy is applied to each data of the multiple attributes, as shown in the following formula ( ₃ ).

これにより、（ε／ｎ_１＋ｎ_２）－局所型差分プライバシーが適用された、ｎ_１＋ｎ_２の属性のデータを集約した統合データ群は、ε－局所型差分プライバシーが満たされることとなる。 As a result, the integrated data set obtained by aggregating data of n ₁ +n ₂ attributes to which (ε/n ₁ +n ₂ )-local differential privacy has been applied satisfies ε-local differential privacy.

生成部１３２は、第１生成部として機能し、第１ノイズ付与クエリを生成する。第１ノイズ付与クエリは、第１データ群に含まれる複数の第１データそれぞれに、第１データ群と第２データ群とを統合して統合データ群とした場合における、統合データ群に含まれる複数のデータに対して所定の確率でノイズが付与されているようにノイズを付与するクエリである。所定の確率は、ε－局所型差分プライバシーが満たされる確率であり、プライバシーの強度を示すパラメータεにより決定されるものとする。例えば、所定の確率は、式（２）を用いて算出される。式（２）に示されるように、εが小さいほど、高い確率でデータｘが他の値に変換される。 The generation unit 132 functions as a first generation unit and generates a first noise-added query. The first noise-added query is a query that adds noise to each of the multiple first data included in the first data group such that when the first data group and the second data group are integrated into an integrated data group, noise is added with a predetermined probability to the multiple data included in the integrated data group. The predetermined probability is the probability that ε-local differential privacy is satisfied, and is determined by a parameter ε indicating the strength of privacy. For example, the predetermined probability is calculated using formula (2). As shown in formula (2), the smaller ε is, the higher the probability that data x is converted to another value.

例えば、生成部１３２は、第１項目情報が示すｎ_１個の属性それぞれの第１データにノイズが付与された場合に、ノイズが付与されたｎ_１個の属性それぞれの第１データがε_１－局所型差分プライバシーを満たすようにノイズを付与する第１ノイズ付与クエリを生成する。ここで、ε_１は、決定部１３１が決定した第１パラメータである。 For example, when noise is added to the first data of each of the n ₁ attributes indicated by the first item information, the generating unit 132 generates a first noise-added query for adding noise to the first data of each of the n ₁ attributes to which noise is added such that the first data satisfies ε ₁ -local differential privacy, where ε ₁ is the first parameter determined by the determining unit 131.

また、生成部１３２は、第１データ群に含まれる複数のデータ識別情報としてのデータＩＤを所定の方法により不可逆変換するクエリである第１不可逆変換クエリを生成する。所定の方法は、例えば、ハッシュ関数を用いてデータＩＤを不可逆変換する方法であるが、これに限らず、不可逆変換可能な方法であれば他の方法を用いてもよい。 The generating unit 132 also generates a first irreversible conversion query, which is a query that irreversibly converts data IDs, which are multiple pieces of data identification information included in the first data group, using a predetermined method. The predetermined method is, for example, a method of irreversibly converting the data IDs using a hash function, but is not limited to this, and any other method that allows irreversible conversion may be used.

また、生成部１３２は、第２生成部として機能し、第２ノイズ付与クエリを生成する。第２ノイズ付与クエリは、第２データ群に含まれる複数の第２データそれぞれに、統合データ群に含まれる複数のデータに対して所定の確率でノイズが付与されているようにノイズを付与するクエリである。例えば、生成部１３２は、第２項目情報が示すｎ_２個の属性それぞれの第２データにノイズが付与された場合に、ノイズが付与されたｎ_２個の属性それぞれの第２データがε_２－局所型差分プライバシーを満たすようにノイズを付与する第２ノイズ付与クエリを生成する。ここで、ε_２は、決定部１３１が決定した第２パラメータである。 Furthermore, the generating unit 132 functions as a second generating unit and generates a second noise-added query. The second noise-added query is a query that adds noise to each of the multiple second data included in the second data group such that noise is added to the multiple data included in the integrated data group with a predetermined probability. For example, when noise is added to the second data of each of the n ₂ attributes indicated by the second item information, the generating unit 132 generates a second noise-added query that adds noise to each of the n ₂ attributes to which noise is added such that the second data satisfies ε ₂ -local differential privacy. Here, ε ₂ is the second parameter determined by the determining unit 131.

また、生成部１３２は、第２データ群に含まれる複数のデータＩＤを、第１不可逆変換クエリと同様に所定の方法により不可逆変換するクエリである第２不可逆変換クエリを生成する。 The generation unit 132 also generates a second irreversible conversion query, which is a query that irreversibly converts multiple data IDs included in the second data group using a predetermined method, similar to the first irreversible conversion query.

なお、生成部１３２は、第１データ群及び第２データ群に含まれる複数の属性それぞれに対応する複数のデータのうち、少なくとも一つの属性に対応するデータが取り得る値の数を減少させ、当該データが取り得る値の数を減少させた後に、複数のデータそれぞれにノイズを付与するノイズ付与クエリを生成してもよい。例えば、生成部１３２は、属性が「年齢」のデータが、複数のユーザそれぞれの実年齢を示している場合に、当該データについて、「１０代」、「２０代」といったように年代を示すデータに変更することにより、当該データが取り得る値を減少させる処理を含むノイズ付与クエリを生成する。このようにすることで、ユーザのプライバシーを高めることができる。 The generation unit 132 may generate a noise-added query that adds noise to each of the multiple data items by reducing the number of possible values of data corresponding to at least one attribute among the multiple data items corresponding to each of the multiple attributes included in the first data group and the second data group, and adding noise to each of the multiple data items after reducing the number of possible values of the data. For example, when data items having an attribute of "age" indicate the actual ages of multiple users, the generation unit 132 generates a noise-added query that includes a process of reducing the possible values of the data items by changing the data items to data indicating an age group, such as "teens" or "twenties." In this way, the privacy of the user can be improved.

また、生成部１３２は、第１データ群に含まれる複数のデータＩＤそれぞれに、ランダムデータを付加してから所定の方法により不可逆変換する第１不可逆変換クエリを生成し、第２データ群に含まれる複数のデータＩＤそれぞれに、当該データＩＤに対応する、第１データ群に含まれるデータＩＤに付加されたランダムデータと同一のランダムデータを付加してから所定の方法により不可逆変換する第２不可逆変換クエリを生成してもよい。このようにすることで、情報処理装置１は、変換後のデータＩＤを変換前のデータＩＤに復号されるリスクを低減することができる。 The generation unit 132 may also generate a first irreversible conversion query that adds random data to each of a plurality of data IDs included in the first data group and then performs irreversible conversion using a predetermined method, and generate a second irreversible conversion query that adds random data that is the same as the random data added to the data ID included in the first data group and that corresponds to the data ID, to each of a plurality of data IDs included in the second data group and then performs irreversible conversion using a predetermined method. In this way, the information processing device 1 can reduce the risk that the converted data ID will be decoded into the data ID before conversion.

また、生成部１３２は、変換後の第１データ群に含まれる第１レコードに含まれる第１データを、第１の割合で他の第１レコードに含まれる第１データと入れ替えることにより変換後の第１データ群を更新するクエリである第１更新クエリを生成してもよい。また、生成部１３２は、変換後の第２データ群に含まれる第２レコードに含まれる第２データを、第２の割合で他の第２レコードに含まれる第２データと入れ替えることにより変換後の第２データ群を更新するクエリである第２更新クエリを生成してもよい。 The generating unit 132 may also generate a first update query, which is a query that updates the converted first data group by replacing the first data included in a first record included in the converted first data group with the first data included in another first record at a first ratio. The generating unit 132 may also generate a second update query, which is a query that updates the converted second data group by replacing the second data included in a second record included in the converted second data group with the second data included in another second record at a second ratio.

ここで、第１の割合及び第２の割合は同じであってもよいし、異なっていてもよい。また、第１の割合及び第２の割合は、データが取り得る値の数によって変化させてもよい。例えば、データが取り得る値の数が多い場合には、当該データが入れ替えられる割合を高くするようにしてもよい。 Here, the first rate and the second rate may be the same or different. Furthermore, the first rate and the second rate may be changed depending on the number of values that the data can take. For example, when the number of values that the data can take is large, the rate at which the data is replaced may be increased.

また、後述の統合部１３５により、統合データ群が生成された後、第１データ群及び第２のデータ群のそれぞれに対して新たにレコードが追加され、新たなレコードを追加した統合データ群の生成が要求されることがある。全ての第１データ群と、全ての第２データ群とに対して、ノイズの付与が複数回繰り返されると、同一のデータ群に対応する複数のバリエーションのデータ群が生成される。この場合、複数のバリエーションのデータ群を分析することにより、匿名化が行われる前のデータ群の内容を推測しやすくなり、ユーザの識別性が上がる等のプライバシーリスクが増大するという問題が発生する。これに対し、生成部１３２は、新たに追加されたレコードのみに対してノイズを付与するノイズ付与クエリを生成してもよい。 In addition, after the integrated data group is generated by the integration unit 135 described below, new records may be added to each of the first data group and the second data group, and generation of an integrated data group with the new records added may be requested. When noise is added multiple times to all of the first data groups and all of the second data groups, multiple variations of data groups corresponding to the same data group are generated. In this case, by analyzing the multiple variations of data groups, it becomes easier to guess the contents of the data group before anonymization, which causes a problem of increased privacy risks such as increased user identifiability. In response to this, the generation unit 132 may generate a noise-added query that adds noise only to the newly added records.

この場合、第１データ群及び第２データ群には、新たに追加されたレコードを特定するために用いることができる特定用データが含まれている。特定用データは、例えば、日付を示す日付データや、レコードが統合データに含まれているか否かを示すフラグである。そして、生成部１３２は、第１不可逆変換クエリを再生成する場合、特定用データに基づいて、新たに追加された第１レコードに対してノイズを付与する第１ノイズ付与クエリを生成し、第２不可逆変換クエリを再生成する場合、特定用データに基づいて新たに追加された第２レコードに対してノイズを付与する第２ノイズ付与クエリを生成する。このようにすることで、情報処理装置１は、統合データ群を提供する場合にプライバシーリスクの増大を抑制することができる。 In this case, the first data group and the second data group contain identification data that can be used to identify the newly added record. The identification data is, for example, date data indicating a date, or a flag indicating whether the record is included in the integrated data. When regenerating the first irreversible conversion query, the generation unit 132 generates a first noise-adding query that adds noise to the newly added first record based on the identification data, and when regenerating the second irreversible conversion query, the generation unit 132 generates a second noise-adding query that adds noise to the newly added second record based on the identification data. In this way, the information processing device 1 can suppress an increase in privacy risk when providing an integrated data group.

送信部１３３は、生成部１３２が生成した第１ノイズ付与クエリと第１不可逆変換クエリとを、第１データ群の提供元に対応する第１装置２に送信する。また、送信部１３３は、生成部１３２が生成した第２ノイズ付与クエリと第２不可逆変換クエリとを、第２データ群の提供元に対応する第２装置３に送信する。送信部１３３は、例えば、予め情報処理装置１と第１装置２との間に設けられた第１のクラウドサービスで提供されるインターネットＶＰＮ（Virtual private network）を介して、第１ノイズ付与クエリと第１不可逆変換クエリとを第１装置２に送信する。同様に、送信部１３３は、例えば、予め情報処理装置１と第２装置３との間に設けられた第２のＶＰＮを介して、第２ノイズ付与クエリと第２不可逆変換クエリとを第２装置３に送信する。 The transmission unit 133 transmits the first noise-added query and the first irreversible conversion query generated by the generation unit 132 to the first device 2 corresponding to the provider of the first data group. The transmission unit 133 also transmits the second noise-added query and the second irreversible conversion query generated by the generation unit 132 to the second device 3 corresponding to the provider of the second data group. The transmission unit 133 transmits the first noise-added query and the first irreversible conversion query to the first device 2, for example, via an Internet VPN (Virtual Private Network) provided by a first cloud service that is provided between the information processing device 1 and the first device 2 in advance. Similarly, the transmission unit 133 transmits the second noise-added query and the second irreversible conversion query to the second device 3, for example, via a second VPN that is provided between the information processing device 1 and the second device 3 in advance.

また、送信部１３３は、生成部１３２により、第１更新クエリと第２更新クエリとが生成された場合には、第１更新クエリを第１装置２に送信するとともに、第２更新クエリを第２装置３に送信する。 In addition, when the generation unit 132 generates a first update query and a second update query, the transmission unit 133 transmits the first update query to the first device 2 and transmits the second update query to the second device 3.

第１装置２は、情報処理装置１から受信したクエリを実行することにより、第１不可逆変換クエリに基づいてデータＩＤから変換されたデータ識別情報としての変換後のデータＩＤと、第１ノイズ付与クエリに基づいてノイズが付与された複数の第１データとを関連付けた複数の第１レコードを含む変換後の第１データ群を生成する。第１装置２は、例えば、情報処理装置１から、第１更新クエリを受信した場合には、第１ノイズ付与クエリに基づいて複数の第１データにノイズを付与する前に第１更新クエリを実行する。その後、第１装置２は、例えば第１のＶＰＮを介して、変換後の第１データ群を情報処理装置１に送信する。なお、第１装置２とは異なる装置が、変換後の第１データ群を情報処理装置１に送信してもよい。 The first device 2 executes the query received from the information processing device 1 to generate a converted first data group including a plurality of first records that associate a converted data ID as data identification information converted from a data ID based on the first irreversible conversion query with a plurality of first data to which noise has been added based on the first noise-adding query. For example, when the first device 2 receives a first update query from the information processing device 1, the first device 2 executes the first update query before adding noise to the plurality of first data based on the first noise-adding query. After that, the first device 2 transmits the converted first data group to the information processing device 1, for example, via the first VPN. Note that a device different from the first device 2 may transmit the converted first data group to the information processing device 1.

第２装置３は、情報処理装置１から受信した第２ノイズ付与クエリと第２不可逆変換クエリとを実行することにより、第２不可逆変換クエリに基づいてデータＩＤから変換されたデータ識別情報としての変換後のデータＩＤと、第２ノイズ付与クエリに基づいてノイズが付与された複数の第２データとを関連付けた複数の第２レコードを含む変換後の第２データ群を生成する。第２装置３は、例えば、情報処理装置１から、第２更新クエリを受信した場合には、第２ノイズ付与クエリに基づいて複数の第２データにノイズを付与する前に第２更新クエリを実行する。その後、第２装置３は、例えば第２のＶＰＮを介して、変換後の第２データ群を情報処理装置１に送信する。なお、第２装置３とは異なる装置が、変換後の第２データ群を情報処理装置１に送信してもよい。 The second device 3 executes the second noise-adding query and the second irreversible conversion query received from the information processing device 1 to generate a converted second data group including a plurality of second records that associate a converted data ID as data identification information converted from a data ID based on the second irreversible conversion query with a plurality of second data to which noise has been added based on the second noise-adding query. For example, when the second device 3 receives a second update query from the information processing device 1, the second device 3 executes the second update query before adding noise to the plurality of second data based on the second noise-adding query. After that, the second device 3 transmits the converted second data group to the information processing device 1, for example, via the second VPN. Note that a device different from the second device 3 may transmit the converted second data group to the information processing device 1.

データ群取得部１３４は、変換後の第１データ群と、変換後の第２データ群とを取得する。例えば、データ群取得部１３４は、第１装置２から送信された変換後の第１データ群を受信し、第２装置３から送信された変換後の第２データ群を受信することにより、変換後の第１データ群と、変換後の第２データ群とを取得する。 The data group acquisition unit 134 acquires the converted first data group and the converted second data group. For example, the data group acquisition unit 134 acquires the converted first data group and the converted second data group by receiving the converted first data group transmitted from the first device 2 and the converted second data group transmitted from the second device 3.

図４は、変換後の第１データ群と変換後の第２データ群との一例を示す図である。図４において、（Ａ）は変換後の第１データ群を示しており、（Ｂ）は変換後の第２データ群を示している。また、図４において、第１データ群と第２データ群とに含まれている同じデータＩＤが、同じ文字列に変換されていることが確認できる。また、図４において、太枠のセルで囲まれたデータが変換されていることが確認できる。 Figure 4 is a diagram showing an example of a first data group after conversion and a second data group after conversion. In Figure 4, (A) shows the first data group after conversion, and (B) shows the second data group after conversion. Also, in Figure 4, it can be seen that the same data ID contained in the first data group and the second data group has been converted into the same character string. Also, in Figure 4, it can be seen that the data surrounded by cells with a bold frame has been converted.

統合部１３５は、データ群取得部１３４が取得した変換後の第１データ群に含まれる複数の第１レコードそれぞれのデータＩＤ（変換後のデータＩＤ）と、データ群取得部１３４が取得した変換後の第２データ群に含まれる複数の第２レコードそれぞれのデータＩＤ（変換後のデータＩＤ）とに基づいて、変換後の第１データ群と変換後の第２データ群とを統合した統合データ群を生成する。具体的には、統合部１３５は、変換後のデータＩＤをキーとして、第１データ群と第２データ群とを結合することにより、統合データ群を生成する。図５は、統合データ群の一例を示す図である。図５に示すように、第１データ群と第２データ群との双方に含まれている変換後のデータＩＤに関連付けられている第１データと第２データとが関連付けられていることが確認できる。 The integration unit 135 generates an integrated data group by integrating the converted first data group and the converted second data group based on the data IDs (converted data IDs) of the first records included in the converted first data group acquired by the data group acquisition unit 134 and the data IDs (converted data IDs) of the second records included in the converted second data group acquired by the data group acquisition unit 134. Specifically, the integration unit 135 generates an integrated data group by combining the first data group and the second data group using the converted data ID as a key. FIG. 5 is a diagram showing an example of an integrated data group. As shown in FIG. 5, it can be seen that the first data and the second data associated with the converted data IDs included in both the first data group and the second data group are associated.

統合部１３５は、変換後のデータＩＤと、変換後の第１データ群と、変換後の第２データ群とを含む統合データ群を生成したが、これに限らない。統合部１３５は、変換後の第１データ群に含まれる複数の第１レコードそれぞれのデータＩＤと、変換後の第２データ群に含まれる複数の第２レコードそれぞれのデータＩＤとに基づいて、変換後の第１データ群と変換後の第２データ群とを統合し、データＩＤを除外して、統合データ群を生成してもよい。このようにすることで、統合データにはデータＩＤが含まれなくなるので、データＩＤに基づいて、統合データから、第１レコード及び第２レコードを復元されるリスクを低減することができる。 The integrating unit 135 generated an integrated data group including the converted data ID, the converted first data group, and the converted second data group, but this is not limited to the above. The integrating unit 135 may integrate the converted first data group and the converted second data group based on the data IDs of each of the multiple first records included in the converted first data group and the data IDs of each of the multiple second records included in the converted second data group, and may generate an integrated data group by excluding the data IDs. In this way, the integrated data does not include the data IDs, and therefore it is possible to reduce the risk that the first record and the second record will be restored from the integrated data based on the data IDs.

また、統合部１３５は、生成した統合データ群をさらに加工して統計データを生成してもよい。そして、統合部１３５は、生成した統計データに対してノイズの付与に用いられる所定の確率を用いてノイズを除去するよう補正を行うようにしてもよい。例えば、統合部１３５は、統合データを用いて統計値を計算する際に、第１データ群及び第２データ群に付与されたノイズの影響を排除するために、プライバシー強度のパラメータε、ε_１及びε_２の値の少なくともいずれかを用いて統計的に統計値を補正する。 The integration unit 135 may further process the generated integrated data group to generate statistical data. The integration unit 135 may then perform correction to remove noise from the generated statistical data using a predetermined probability used to add noise. For example, when calculating a statistical value using the integrated data, the integration unit 135 statistically corrects the statistical value using at least one of the values of privacy strength parameters ε, ε ₁ , and ε ₂ to remove the influence of the noise added to the first data group and the second data group.

例えば、統合データに含まれる、ある属性のデータにノイズを付与する際の遷移行列をＰとし、遷移行列Ｐに含まれる要素をｐ_ｉ，ｊとする。ｐ_ｉ，ｊは、ある属性の値ｉが値ｊにランダムに遷移する確率を示しており、例えば、上述した式（２）のｘをｉ、ｙをｊと置き換えた式を用いて決定される。統合部１３５は、統合データを加工して得られる、ある属性の分布Ｑ＝（ｑ_１，…，ｑ_ｄ）^Ｔを、遷移行列Ｐと、以下の式（４）とを用いて分布Ｑ’に補正する。 For example, let P be the transition matrix when adding noise to data of a certain attribute contained in the integrated data, and let p _i,j be the elements contained in the transition matrix P. p _i,j indicates the probability that the value i of a certain attribute randomly transitions to the value j, and is determined, for example, using an equation in which x is replaced with i and y is replaced with j in the above-mentioned equation (2). The integration unit 135 corrects the distribution Q=(q ₁ , ..., q _d ) ^T of a certain attribute obtained by processing the integrated data to a distribution Q' using the transition matrix P and the following equation (4).

ここで、ある属性のデータが第１データ群に含まれる場合には、式（２）に含まれるεに対し、第１パラメータε₁が適用され、ある属性のデータが第２データ群に含まれる場合には、式（２）に含まれるεに対し、第２パラメータε₂が適用されて遷移行列Ｐが構成される。また、εのみ分かる場合には、式（３）を用いて第１パラメータε₁及び第２パラメータε_２を導出し、同様に遷移行列Ｐが構成されるものとする。このようにすることで、情報処理装置１は、ノイズが付与される前の第１データ群及び第２データ群に対応する確率が高い統計データを生成することができる。 Here, when data of a certain attribute is included in the first data group, a first parameter ε ₁ is applied to ε included in formula (2), and when data of a certain attribute is included in the second data group, a second parameter ε ₂ is applied to ε included in formula (2) to configure the transition matrix P. Also, when only ε is known, the first parameter ε ₁ and the second parameter ε ₂ are derived using formula (3), and the transition matrix P is similarly configured. In this way, the information processing device 1 can generate statistical data that is highly likely to correspond to the first data group and the second data group before noise is added.

なお、統合部１３５により統合された統合データ群は、送信部１３３により、第１装置２及び第２装置３に送信されてもよい。このようにすることで、第１事業者において、第２事業者が収集した第２データに基づいてデータ分析を行うことができるとともに、第２事業者において、第１事業者が収集した第１データに基づいてデータ分析を行うことができる。 The integrated data group integrated by the integration unit 135 may be transmitted by the transmission unit 133 to the first device 2 and the second device 3. In this manner, the first business operator can perform data analysis based on the second data collected by the second business operator, and the second business operator can perform data analysis based on the first data collected by the first business operator.

［動作シーケンス］
続いて、情報処理装置１に係る処理の流れについて説明する。図６は、情報処理装置１が統合データ群を生成するまでの処理の流れを示すシーケンス図である。 [Operation sequence]
Next, a description will be given of the flow of processing related to the information processing device 1. Fig. 6 is a sequence diagram showing the flow of processing up to when the information processing device 1 generates an integrated data group.

まず、決定部１３１は、第１装置２から第１レコードを構成する複数の属性それぞれに対応する項目を示す第１項目情報を取得するとともに（Ｓ１）、第２装置３から第２レコードを構成する複数の属性それぞれに対応する項目を示す第２項目情報を取得する（Ｓ２）。 First, the determination unit 131 acquires from the first device 2 first item information indicating items corresponding to each of the multiple attributes constituting the first record (S1), and acquires from the second device 3 second item information indicating items corresponding to each of the multiple attributes constituting the second record (S2).

続いて、決定部１３１は、取得した第１項目情報と、取得した第２項目情報とに基づいて、データに対応する属性の数を特定する（Ｓ３）。具体的には、決定部１３１は、第１データに対応する属性の個数ｎ_１と、第２データに対応する属性の個数ｎ_２とを特定する。そして、決定部１３１は、特定した属性の個数ｎ_１、ｎ_２に基づいて、ノイズが付与された後の複数の属性それぞれの第１データ及び第２データが満たす局所型差分プライバシーにおけるプライバシーの強度を示す第１パラメータε_１及び第２パラメータε_２を決定する（Ｓ４）。 Next, the determination unit 131 identifies the number of attributes corresponding to the data based on the acquired first item information and the acquired second item information (S3). Specifically, the determination unit 131 identifies the number _n1 of attributes corresponding to the first data and the number _n2 of attributes corresponding to the second data. Then, based on the identified numbers _n1 and _n2 of attributes, the determination unit 131 determines a first parameter _ε1 and a second parameter _ε2 indicating the strength of privacy in local differential privacy satisfied by the first data and the second data for each of the multiple attributes after noise is added (S4).

続いて、生成部１３２は、第１ノイズ付与クエリ、第２ノイズ付与クエリ、第１不可逆変換クエリ及び第２不可逆変換クエリを生成する（Ｓ５）。生成部１３２は、取得した第１項目情報と、決定部１３１が決定した第１パラメータε_１とに基づいて、第１ノイズ付与クエリを生成し、取得した第２項目情報と、決定部１３１が決定した第２パラメータε_２とに基づいて、第２ノイズ付与クエリを生成する。また、生成部１３２は、第１データに含まれるデータＩＤを不可逆変換する第１不可逆変換クエリ、及び、第２データに含まれるデータＩＤを不可逆変換する第２不可逆変換クエリを生成する。 Next, the generating unit 132 generates a first noise-added query, a second noise-added query, a first irreversible conversion query, and a second irreversible conversion query (S5). The generating unit 132 generates the first noise-added query based on the acquired first item information and the first parameter ε ₁ determined by the determining unit 131, and generates the second noise-added query based on the acquired second item information and the second parameter ε ₂ determined by the determining unit 131. The generating unit 132 also generates a first irreversible conversion query that irreversibly converts a data ID included in the first data, and a second irreversible conversion query that irreversibly converts a data ID included in the second data.

続いて、送信部１３３は、第１ノイズ付与クエリ及び第１不可逆変換クエリを第１装置２に送信し（Ｓ６）、第２ノイズ付与クエリ及び第２不可逆変換クエリを第２装置３に送信する（Ｓ７）。 Then, the transmission unit 133 transmits the first noise-added query and the first irreversible conversion query to the first device 2 (S6), and transmits the second noise-added query and the second irreversible conversion query to the second device 3 (S7).

第１装置２は、情報処理装置１から受信したクエリを実行することにより変換後の第１データ群を生成する（Ｓ８）。第２装置３は、情報処理装置１から受信したクエリを実行することにより変換後の第２データ群を生成する（Ｓ９）。第１装置２は、変換後の第１データ群を情報処理装置１に送信し（Ｓ１０）、第２装置３は、変換後の第２データ群を情報処理装置１に送信する（Ｓ１１）。データ群取得部１３４は、第１装置２から送信された変換後の第１データ群を受信し、第２装置３から送信された変換後の第２データ群を受信する。 The first device 2 generates a converted first data group by executing the query received from the information processing device 1 (S8). The second device 3 generates a converted second data group by executing the query received from the information processing device 1 (S9). The first device 2 transmits the converted first data group to the information processing device 1 (S10), and the second device 3 transmits the converted second data group to the information processing device 1 (S11). The data group acquisition unit 134 receives the converted first data group transmitted from the first device 2, and receives the converted second data group transmitted from the second device 3.

統合部１３５は、データ群取得部１３４が受信した変換後の第１データ群に含まれる複数の第１レコードそれぞれの変換後のデータＩＤと、データ群取得部１３４が取得した変換後の第２データ群に含まれる複数の第２レコードそれぞれの変換後のデータＩＤとに基づいて、変換後の第１データ群と変換後の第２データ群とを統合した統合データ群を生成する（Ｓ１２）。 The integration unit 135 generates an integrated data group by integrating the converted first data group and the converted second data group based on the converted data IDs of each of the multiple first records included in the converted first data group received by the data group acquisition unit 134 and the converted data IDs of each of the multiple second records included in the converted second data group acquired by the data group acquisition unit 134 (S12).

［変形例１］
上述の実施の形態において、情報処理装置１は、第１データ群と第２データ群とに対応して、２つのノイズ付与クエリ及び２つの不可逆変換クエリを生成したが、これに限らない。情報処理装置１は、３つ以上のデータ群に対応して、ノイズ付与クエリ及び不可逆変換クエリを生成してもよい。 [Modification 1]
In the above embodiment, the information processing device 1 generates two noise-added queries and two lossy conversion queries corresponding to the first data group and the second data group, but this is not limited to the above. The information processing device 1 may generate noise-added queries and lossy conversion queries corresponding to three or more data groups.

例えば、データ識別情報としてのデータＩＤに関連付けられているデータ群がｋ個（ただし、ｋは３以上の整数）存在し、第ｋデータ群に含まれる第ｋレコードは、ｎ_ｋ個の属性それぞれに対応する複数の第ｋデータを含んでいるものとする。 For example, there are k data groups (where k is an integer greater than or equal to 3) associated with a data ID as data identification information, and the kth record included in the kth data group includes multiple kth data corresponding to each of the n _k attributes.

この場合、生成部１３２は、第ｋ生成部として機能し、第ｋデータ群に含まれる複数の第ｋデータそれぞれに、ｋ個のデータ群を統合して統合データ群とした場合における、統合データ群に含まれる複数のデータに対して所定の確率でノイズが付与されているようにノイズを付与するクエリである第ｋノイズ付与クエリと、第ｋデータ群に含まれる複数のデータ識別情報を所定の方法により不可逆変換するクエリである第ｋ不可逆変換クエリとを生成する。そして、生成部１３２は、ｎ_ｋ個の属性それぞれの第ｋデータにノイズが付与された場合に、ノイズが付与されたｎ_ｋ個の属性それぞれの第ｋデータがε_ｋ－局所型差分プライバシーを満たすようにノイズを付与する第ｋノイズ付与クエリを生成する。ただし、プライバシーの強度を示すパラメータε_ｋはε_ｋ＝ε／（ｎ_１＋ｎ_２＋・・・＋ｎ_ｋ）である。 In this case, the generating unit 132 functions as a kth generating unit, and generates a kth noise-added query, which is a query for adding noise to each of the kth data included in the kth data group, such that noise is added to the multiple data included in the integrated data group with a predetermined probability when the k data groups are integrated into an integrated data group, and a kth irreversible conversion query, which is a query for irreversibly converting the multiple data identification information included in the kth data group by a predetermined method.Then, the generating unit 132 generates a kth noise-added query for adding noise to each of the kth data of the n _k attributes to which noise is added, such that the kth data of each of the n _k attributes to which noise is added satisfies ε _k -local differential privacy when noise is added to the kth data of each of the n k attributes.Here, the parameter ε _k indicating the strength of privacy is ε _k = ε/(n ₁ + n ₂ + ... + n _k ).

また、生成部１３２は、第ｋデータ群に含まれる複数のユーザを所定の方法により不可逆変換するクエリである第ｋ不可逆変換クエリを生成する。送信部１３３は、生成された第ｋノイズ付与クエリと、第ｋ不可逆変換クエリとを第ｋ装置に送信する。 The generating unit 132 also generates a k-th irreversible conversion query, which is a query for irreversibly converting multiple users included in the k-th data group using a predetermined method. The transmitting unit 133 transmits the generated k-th noise-added query and the k-th irreversible conversion query to the k-th device.

データ群取得部１３４は、第ｋ装置から、変換後の第ｋデータ群を取得し、統合部１３５は、データ群取得部１３４が取得した変換後の第ｋデータ群に含まれる複数の第ｋレコードそれぞれのデータ識別情報に基づいて、ｋ個のデータ群を結合することにより、統合データ群を生成する。このようにすることで、情報処理装置１は、データ群が３つ以上存在する場合にも、ε－局所型差分プライバシーを満たす統合データを生成することができる。 The data group acquisition unit 134 acquires the converted k-th data group from the k-th device, and the integration unit 135 generates an integrated data group by combining k data groups based on the data identification information of each of the multiple k-th records included in the converted k-th data group acquired by the data group acquisition unit 134. In this way, the information processing device 1 can generate integrated data that satisfies ε-local differential privacy even when there are three or more data groups.

［変形例２］
上述の実施の形態では、生成部１３２が、変換後の第１データ群に含まれる第１レコードに含まれる第１データを、他の第１レコードに含まれる第１データと入れ替える第１更新クエリと、変換後の第２データ群に含まれる第２レコードに含まれる第２データを、他の第２レコードに含まれる第２データと入れ替える第２更新クエリとを生成し、第１装置２が第１更新クエリを実行し、第２装置３が第２更新クエリを実行したが、これに限らない。情報処理装置１が、データの入れ替えを実行してもよい。 [Modification 2]
In the above embodiment, the generation unit 132 generates a first update query for replacing the first data included in a first record included in the converted first data group with the first data included in another first record, and a second update query for replacing the second data included in a second record included in the converted second data group with the second data included in another second record, and the first device 2 executes the first update query and the second device 3 executes the second update query, but this is not limited to the above. The information processing device 1 may execute the data replacement.

この場合、制御部１３は、統合データ群に含まれるレコードに含まれるデータを、第３の割合で他のレコードに含まれるデータと入れ替えることにより統合データ群を更新する更新部を有する。例えば、更新部は、統合データ群に含まれるレコードに含まれる複数の項目それぞれに対応するデータのうちの一部のデータを、第３の割合で他のレコードに含まれている、当該一部のデータと同じ項目のデータと入れ替えることにより、統合データ群を更新する。 In this case, the control unit 13 has an update unit that updates the integrated data group by replacing data included in records included in the integrated data group with data included in other records at a third ratio. For example, the update unit updates the integrated data group by replacing a portion of data corresponding to each of a plurality of items included in records included in the integrated data group with data of the same item as the portion of data included in other records at a third ratio.

また、更新部は、データ群取得部１３４が取得した変換後の第１データ群に含まれる第１レコードに含まれる第１データを、第１の割合で他の第１レコードに含まれる第１データと入れ替えることにより変換後の第１データ群を更新するとともに、データ群取得部１３４が取得した変換後の第２データ群に含まれる第２レコードに含まれる第２データを、第２の割合で他の第２レコードに含まれる第２データと入れ替えることにより変換後の第２データ群を更新してもよい。例えば、更新部は、生成部１３２により生成された第１更新クエリ及び第２更新クエリを実行することにより、変換後の第１データ群及び変換後の第２データ群を更新する。そして、統合部１３５が、更新された第１データ群と、更新された第２データ群とを統合することにより統合データを生成する。このようにすることで、情報処理装置１は、第１装置２及び第２装置３におけるデータ群の変換に係る負荷を軽減することができる。 The update unit may also update the converted first data group by replacing the first data included in the first record included in the converted first data group acquired by the data group acquisition unit 134 with the first data included in the other first records at a first ratio, and update the converted second data group by replacing the second data included in the second record included in the converted second data group acquired by the data group acquisition unit 134 with the second data included in the other second records at a second ratio. For example, the update unit updates the converted first data group and the converted second data group by executing the first update query and the second update query generated by the generation unit 132. Then, the integration unit 135 generates integrated data by integrating the updated first data group and the updated second data group. In this way, the information processing device 1 can reduce the load associated with the conversion of the data groups in the first device 2 and the second device 3.

［情報処理装置１による効果］
以上説明したように、本実施の形態に係る情報処理装置１は、第１データ群と第２データ群とを統合して統合データ群とした場合における、統合データ群に含まれる複数のデータに対して所定の確率でノイズが付与されているように第１データ群にノイズを付与する第１ノイズ付与クエリと、第２データ群にノイズを付与する第２ノイズ付与クエリと、第１データ群に含まれる複数のデータ識別情報を不可逆変換するクエリである第１不可逆変換クエリと、第２データ群に含まれる複数のデータ識別情報を不可逆変換するクエリである第２不可逆変換クエリとを生成し、第１ノイズ付与クエリと第１不可逆変換クエリとを第１装置２に送信し、第２ノイズ付与クエリと第２不可逆変換クエリとを第２装置３に送信する。そして、情報処理装置１は、第１不可逆変換クエリに基づいて変換されたデータ識別情報と、第１ノイズ付与クエリに基づいてノイズが付与された複数の第１データとを関連付けた複数の第１レコードを含む変換後の第１データ群と、第２不可逆変換クエリに基づいて変換されたデータ識別情報と、第２ノイズ付与クエリに基づいてノイズが付与された複数の第２データとを関連付けた複数の第２レコードを含む変換後の第２データ群とを取得し、これらのデータ群に含まれるデータ識別情報に基づいて、これらのデータ群を統合し、前記データ識別情報を除外して、統合データ群を生成する。このようにすることで、情報処理装置１は、ユーザ情報を収集する過程で、匿名化処理が行われていないユーザ情報が流出しないようにすることができる。 [Effects of information processing device 1]
As described above, the information processing device 1 of this embodiment generates a first noise-adding query that adds noise to the first data group so that noise is added with a predetermined probability to multiple data included in the integrated data group when the first data group and the second data group are integrated to form an integrated data group, a second noise-adding query that adds noise to the second data group, a first irreversible conversion query that is a query that irreversibly converts multiple pieces of data identification information included in the first data group, and a second irreversible conversion query that is a query that irreversibly converts multiple pieces of data identification information included in the second data group, and transmits the first noise-adding query and the first irreversible conversion query to the first device 2, and transmits the second noise-adding query and the second irreversible conversion query to the second device 3. Then, the information processing device 1 acquires a converted first data group including a plurality of first records in which the data identification information converted based on the first irreversible conversion query is associated with a plurality of first data to which noise is added based on the first noise-adding query, and a converted second data group including a plurality of second records in which the data identification information converted based on the second irreversible conversion query is associated with a plurality of second data to which noise is added based on the second noise-adding query, and integrates these data groups based on the data identification information included in these data groups, and generates an integrated data group by excluding the data identification information. In this way, the information processing device 1 can prevent user information that has not been anonymized from leaking during the process of collecting user information.

なお、本発明により、国連が主導する持続可能な開発目標（ＳＤＧｓ）の目標９「産業と技術革新の基盤をつくろう」に貢献することが可能となる。 Furthermore, this invention will make it possible to contribute to Goal 9 of the United Nations' Sustainable Development Goals (SDGs), which is "Build resilient infrastructure, promote inclusive and sustainable industrialization, and promote innovation and infrastructure."

以上、本発明を実施の形態を用いて説明したが、本発明の技術的範囲は上記実施の形態に記載の範囲には限定されず、その要旨の範囲内で種々の変形及び変更が可能である。例えば、装置の全部又は一部は、任意の単位で機能的又は物理的に分散・統合して構成することができる。また、複数の実施の形態の任意の組み合わせによって生じる新たな実施の形態も、本発明の実施の形態に含まれる。組み合わせによって生じる新たな実施の形態の効果は、もとの実施の形態の効果を併せ持つ。 Although the present invention has been described above using embodiments, the technical scope of the present invention is not limited to the scope described in the above embodiments, and various modifications and changes are possible within the scope of the gist of the invention. For example, all or part of the device can be configured by distributing or integrating functionally or physically in any unit. In addition, new embodiments resulting from any combination of multiple embodiments are also included in the embodiments of the present invention. The effect of the new embodiment resulting from the combination also has the effect of the original embodiment.

１情報処理装置
２第１装置
３第２装置
１１通信部
１２記憶部
１３制御部
１３１決定部
１３２生成部
１３３送信部
１３４データ群取得部
１３５統合部
Ｓ情報処理システム Reference Signs List 1 Information processing device 2 First device 3 Second device 11 Communication unit 12 Storage unit 13 Control unit 131 Determination unit 132 Generation unit 133 Transmission unit 134 Data group acquisition unit 135 Integration unit S Information processing system

Claims

a first generating unit that generates a first noise-adding query for each of a plurality of first data included in a first data group including a plurality of first records associating first data with data identification information for identifying data and a second data group including a plurality of second records associating the data identification information with second data; the first noise-adding query is a query for adding noise to a plurality of data included in the integrated data group by integrating the first data group and the second data group into an integrated data group such that noise is added with a predetermined probability to the plurality of data included in the integrated data group; and a first irreversible conversion query is a query for irreversibly converting a plurality of the data identification information included in the first data group by a predetermined method;
a second generation unit that generates a second noise-adding query, which is a query for adding noise to each of the plurality of second data included in the second data group in such a manner that noise is added to the plurality of data included in the integrated data group with the predetermined probability, and a second irreversible conversion query, which is a query for irreversibly converting the plurality of data identification information included in the second data group by the predetermined method;
a transmission unit that transmits the first noise-added query and the first irreversible conversion query generated by the first generation unit to a first device corresponding to a provider of the first data group, and transmits the second noise-added query and the second irreversible conversion query generated by the second generation unit to a second device corresponding to a provider of the second data group;
a data group acquiring unit that acquires a converted first data group including a plurality of first records that associate the data identification information converted based on the first irreversible conversion query with a plurality of first data to which noise has been added based on the first noise-adding query, and a converted second data group including a plurality of second records that associate the data identification information converted based on the second irreversible conversion query with a plurality of second data to which noise has been added based on the second noise-adding query;
an integration unit that generates an integrated data group by integrating the converted first data group and the converted second data group based on the data identification information of each of a plurality of first records included in the converted first data group acquired by the data group acquisition unit and the data identification information of each of a plurality of second records included in the converted second data group acquired by the data group acquisition unit;
An information processing device having the above configuration.

the first record includes a plurality of first data corresponding to each of the n1 attributes;
the second record includes a plurality of second data corresponding to each of the n2 attributes;
The plurality of data included in the integrated data group satisfies ε-local differential privacy, where ε is a parameter indicating the strength of privacy,
the first generation unit generates the first noise-added query for adding noise such that, when noise is added to the first data of each of the n1 attributes, the first data of each of the n1 attributes to which noise has been added satisfies ε1-local differential privacy (wherein a parameter ε1 indicating a strength of privacy is ε1=ε/(n1+n2));
the second generation unit generates the second noise-added query for adding noise such that, when noise is added to the second data of each of the n2 attributes, the second data of each of the n2 attributes to which noise has been added satisfies ε2-local differential privacy (wherein a parameter ε2 indicating a strength of privacy is ε2=ε/(n1+n2));
The information processing device according to claim 1 .

There are k data groups associated with the data identification information (where k is an integer of 3 or more);
The kth generation unit generates a kth noise-adding query, which is a query for adding noise to each of the kth data included in the kth data group so that noise is added to the multiple data included in the integrated data group with a predetermined probability when the k data groups are integrated to form the integrated data group, and a kth irreversible conversion query, which is a query for irreversibly converting the multiple pieces of data identification information included in the kth data group by a predetermined method;
a k-th record included in the k-th data group includes a plurality of k-th data corresponding to n k attributes,
The k generation unit generates the k noise-added query for adding noise such that, when noise is added to the k-th data of each of the n k attributes, the k-th data of each of the n k attributes to which noise has been added satisfies ε k -local differential privacy (wherein a parameter ε k indicating the strength of privacy is ε k = ε / (n 1 + n 2 + ... + n k )).
The information processing device according to claim 2 .

a determination unit that obtains from the first device first item information indicating items corresponding to each of a plurality of attributes constituting the first record, and obtains from the second device second item information indicating items corresponding to each of a plurality of attributes constituting the second record, identifies n1, which is the number of attributes corresponding to the first data, based on the obtained first item information, identifies n2, which is the number of attributes corresponding to the second data, based on the obtained second item information, and determines a first parameter ε1 and a second parameter ε2 indicating a strength of privacy in local differential privacy satisfied by the first data and the second data after noise has been added, based on the identified numbers of attributes n1 and n2;
The information processing device according to claim 2 .

At least one of the first noise-adding query and the second noise-adding query is a query that reduces a number of possible values of data corresponding to at least one attribute among a plurality of data corresponding to each of a plurality of attributes included in a data group to which noise is added, and adds the noise to each of the plurality of data after reducing the number of possible values of the data .
The information processing device according to claim 1 .

the first generation unit generates a first update query that is a query that updates the first data group after the conversion by replacing first data included in the first record included in the first data group after the conversion with the first data included in another first record at a first ratio;
the second generation unit generates a second update query that is a query that updates the converted second data group by replacing second data included in the second records included in the converted second data group with the second data included in other second records at a second ratio;
The transmission unit transmits the first update query generated by the first generation unit to the first device, and transmits the second update query generated by the second generation unit to the second device.
The information processing device according to claim 1 .

an updating unit that updates the integrated data set by replacing data included in records included in the integrated data set with data included in other records at a third ratio;
The information processing device according to claim 1 .

the first generation unit generates the first irreversible conversion query by adding random data to each of the plurality of pieces of data identification information included in the first data group and then performing irreversible conversion using the predetermined method;
the second generation unit generates the second irreversible conversion query by performing irreversible conversion using the predetermined method after adding random data that is the same as the random data added to the data identification information included in the first data group and that corresponds to the data identification information, to each of the plurality of data identification information included in the second data group.
The information processing device according to claim 1 .

the integration unit further processes the integrated data group to generate statistical data, and corrects the statistical data to remove the noise using the predetermined probability used to impart the noise.
The information processing device according to claim 1 .

the first data group and the second data group include identification data that can be used to identify a newly added record;
When regenerating the first irreversible conversion query, the first generation unit generates the first noise-added query that adds the noise to a newly added first record based on the identification data;
When regenerating the second irreversible conversion query, the second generation unit generates the second noise-added query that adds the noise to a newly added second record based on the identification data.
The information processing device according to claim 1 .

the integrating unit integrates the converted first data group and the converted second data group based on the data identification information of each of a plurality of first records included in the converted first data group and the data identification information of each of a plurality of second records included in the converted second data group, and generates the integrated data group by excluding the data identification information.
The information processing device according to claim 1 .

Executed by the information processing device,
a step of generating a first noise-adding query for each of a plurality of first data included in a first data group including a plurality of first records associating first data with data identification information for identifying data and a second data group including a plurality of second records associating the data identification information with second data, the first data group being one of the first data group and the second data group being one of the second data groups, the first noise-adding query being a query for adding noise to the plurality of data included in the integrated data group by integrating the first data group and the second data group into an integrated data group such that noise is added with a predetermined probability to the plurality of data included in the integrated data group, and a first irreversible conversion query being a query for irreversibly converting the plurality of data identification information included in the first data group by a predetermined method;
generating a second noise-adding query that is a query for adding noise to each of the plurality of second data included in the second data group in such a manner that noise is added to the plurality of data included in the integrated data group with the predetermined probability, and a second irreversible conversion query that is a query for irreversibly converting the plurality of data identification information included in the second data group by the predetermined method;
transmitting the generated first noise-added query and the generated first lossy conversion query to a first device corresponding to a provider of the first data group, and transmitting the generated second noise-added query and the generated second lossy conversion query to a second device corresponding to a provider of the second data group;
acquiring a converted first data group including a plurality of first records associating the data identification information converted based on the first irreversible conversion query with a plurality of first data to which noise has been added based on the first noise-adding query, and acquiring a converted second data group including a plurality of second records associating the data identification information converted based on the second irreversible conversion query with a plurality of second data to which noise has been added based on the second noise-adding query;
generating an integrated data group by integrating the converted first data group and the converted second data group based on the data identification information of each of a plurality of first records included in the acquired converted first data group and the data identification information of each of a plurality of second records included in the acquired converted second data group;
An information processing method comprising the steps of: