WO2016002086A1

WO2016002086A1 - Anonymized data providing device and method

Info

Publication number: WO2016002086A1
Application number: PCT/JP2014/067983
Authority: WO
Inventors: 啓成藤原; 佐藤　嘉則
Original assignee: 株式会社日立製作所
Priority date: 2014-07-04
Filing date: 2014-07-04
Publication date: 2016-01-07
Also published as: JPWO2016002086A1; JP6263620B2

Abstract

[Problem] To propose an anonymized data providing device and method whereby an anonymized data set matching the need of a data user can be provided. [Solution] A data user sets an allowable significance level of a desired statistical amount of a desired attribute regarding a desired data set as a user requirement. An anonymization process is executed a plurality of times for the data set selected by the data user. With respect to each of a plurality of anonymized data sets obtained by the plurality of anonymization process, the statistical amount set by the data user is respectively calculated, the calculated statistical amount of each of the anonymized data sets and the statistical amount of the data set of original data are compared, the anonymized data set of which a difference in the statistical amount satisfies the allowable significance level set by the data user is selected as an anonymized data set satisfying the user requirement, and the selected anonymized data set is provided to the data user.

Description

Anonymized data providing apparatus and method

The present invention relates to an anonymized data providing apparatus and method, and is preferably applied to an anonymized data providing system that provides data after anonymizing or obscuring information related to privacy in order to use data secondarily. It is a thing.

In recent years, the amount of information stored has increased explosively with the development of various information technologies such as lower storage costs, larger scale, and network development. Under such circumstances, a movement to utilize so-called big data is activated.

By the way, the secondary use of personal information in the big data must be used with privacy protection. In this case, for example, simply deleting personal information or converting an ID that identifies an individual into another ID may cause a risk that the person is identified by combining conditions. For this reason, the k-anonymization technique is widely used as a technique for protecting privacy more safely. The k-anonymization technology is a technology for anonymizing original data so that there are at least k (hereinafter referred to as k values) data having the same condition depending on a combination of obfuscation attributes.

As for the anonymization method using the k-anonymization technology, for example, in Patent Document 1, priorities are set for each data at the time of data processing (during k-anonymization processing), and transformation is performed using a function. A method is disclosed in which the data requested by the data user is retained as much as possible by evaluating the data, and the lack of the information requested by the data user is prevented.

Patent Document 2 discloses an anonymization database useful for testing in anonymizing a database (k-anonymization, I-diversification, etc.) used for testing a database-centric application. Discloses a method for ranking quasi-identifiers according to their impact on testing.

JP 2011-113285 A US Patent Application Publication No. 2012/0036135

By the way, in the anonymization process using the k-anonymization technique (hereinafter referred to as k-anonymization process as appropriate), the safety of personal information increases as the k value increases, but the amount of information loss increases. Become. In other words, in the k-anonymization process, there is a trade-off relationship between information security and accuracy.

In this case, in the conventional k-anonymization process, based on the evaluation index such as the k value and the amount of information loss, a data group (for example, a data group of a diabetic patient or a hypertension patient, (This is called a data set) is k-anonymized. For this reason, in such k-anonymization processing, the attribute that the data user wants to prioritize in order to increase the evaluation index (hereinafter referred to as priority attribute) is anonymized, and the k-anonymization processing data set (Hereafter, this is called an anonymized data set), but it may not meet the needs of data users.

As one method for solving such a problem, it is conceivable to perform k-anonymization processing on the data group to be analyzed, excluding the priority attribute of the data user. However, according to such a method, when a priority attribute must be included, an anonymized data set more suitable for analysis purposes cannot be selected.

The present invention has been made in view of the above points, and intends to propose an anonymized data providing apparatus and method that can provide an anonymized data set that meets the needs of data users.

In order to solve such a problem, in the present invention, in the anonymized data providing apparatus that anonymizes the original data and provides the data user with anonymization, the anonymization is performed on the data set of the original data. A processing unit, an anonymized data selection processing unit that controls the anonymization processing unit, and the anonymized data set are managed as an anonymized data set, and in response to a request from the data user, the anonymization A data providing unit for providing the data set to the data user, and the data user selects the desired data set, and an allowable significance level of a desired statistic of a desired attribute for the data set. Is set as a user requirement, and the anonymized data selection processing unit performs a plurality of times for the data set selected by the data user. The anonymization processing unit is controlled to execute the anonymization process, and the statistics set by the data user are set for the plurality of anonymized data sets obtained by the anonymization process multiple times. Each calculated, the calculated statistic of each anonymized data set is compared with the statistic of the data set of the original data, respectively, the difference of the statistic is the tolerance set by the data user The anonymized data set that satisfies the significance level is selected as the anonymized data set that satisfies the user requirements, and the data providing unit selects the anonymized data set selected by the anonymized data selection processing unit. Provided to users.

Further, in the present invention, in the anonymized data providing method executed in the anonymized data providing apparatus that anonymizes the original data and provides it to the data user, the anonymized data providing apparatus includes the data set of the original data. The anonymization processing unit that executes the anonymization processing, the anonymization data selection processing unit that controls the anonymization processing unit, and the anonymized data set are managed as an anonymized data set, and the data In response to a request from a user, a data providing unit that provides the anonymized data set to the data user is provided, and the data user selects the desired data set, and An allowable significance level of a desired statistic of a desired attribute is set as a user requirement, and the anonymized data selection processing unit provides the data user with A first step of controlling the anonymization processing unit to execute the anonymization processing a plurality of times on the selected data set, and the anonymization data selection processing unit performs the anonymization a plurality of times For each of the plurality of anonymized data sets obtained by processing, a second step of calculating the statistics set by the data user, and each of the anonymous data calculated by the anonymized data selection processing unit The anonymized data set that compares the statistic of the data set and the statistic of the data set of the original data, and the difference in the statistic satisfies the allowable significance level set by the data user A third step of selecting the anonymized data set satisfying the user requirements and the data providing unit selected by the anonymized data selection processing unit The anonymous data set was provided and a fourth step of providing the data user.

According to the anonymized data providing apparatus and method, an anonymized data set that satisfies the user requirements set by the data user can be provided to the data user.

According to the present invention, it is possible to realize an anonymized data provision device and method that can provide anonymized data set that meets the needs of a data user to the data user.

It is a block diagram which shows the hardware constitutions of the anonymization data provision system by this Embodiment. It is a block diagram which shows the logic structure of the anonymization data provision system by this Embodiment. It is a conceptual diagram which shows the structure of the original data and the data set before anonymization. It is a conceptual diagram which shows the structure of a privacy protection condition table. It is a conceptual diagram which shows the structure of an anonymization data set. It is a conceptual diagram which shows the structure of data catalog information. It is an approximate line figure showing the composition of a data catalog selection screen roughly. It is an approximate line figure showing the composition of a statistic strengthening item specification screen roughly. It is a ladder chart which shows the flow of the process regarding provision of an anonymization data set. It is a flowchart which shows the process sequence of a statistic reinforcement | strengthening anonymization data selection process. It is a flowchart which shows the process sequence of k value variable statistics reinforcement | strengthening process. It is a flowchart which shows the process sequence of k value fixed statistics reinforcement | strengthening process. It is a flowchart which shows the process sequence of k value fixed statistics reinforcement | strengthening process. 10 is a chart showing an example of a k-anonymization parameter group.

Hereinafter, an embodiment of the present invention will be described in detail with reference to the drawings.

(1) Configuration of Information Processing System According to this Embodiment In FIG. 1, reference numeral 1 denotes an information processing system according to this embodiment as a whole. In this information processing system 1, the data collection / management / provider 3 collects data provided by the original data provider 2 (hereinafter referred to as original data), and the collected original data is k-anonymized. In this system, the original data managed and k-anonymized is provided by the data collection / management / provider 3 in response to a request from the data user 4.

As shown in FIG. 1, the information processing system 1 includes an information processing device 20 of an original data provider 2 and a data preparation device 31 that constitutes an anonymized data providing system 30 of a data collection / management / provider 3. The data providing apparatus 32 connected through the first network 10 and constituting the anonymized data providing system 30 is connected to the client terminal 40 of the data user 4 through the second network 11. ing.

The information processing apparatus 20 includes a personal computer including a CPU (Central Processing Unit) 21, a memory 22, a hard disk device (HDD: Hard Disc Drive) 23, an input device 24, a monitor 25, and the like. Original data is accumulated in the hard disk device 23, and this original data is collected by the data preparation device 31 of the anonymized data providing system 30 via the first network 10.

The data preparation device 31 includes a personal computer including a CPU 33, a memory 34, a hard disk device 35, and the like. The data preparation device 31 performs k-anonymization processing on the original data collected from the information processing device 20 of each original data provider 2. The data preparation device 31 transmits the anonymized data obtained by the k-anonymization process to the data providing device 32.

As with the data preparation device 31, the data provision device 32 includes a personal computer including a CPU 36, a memory 37, a hard disk device 38, and the like. The data providing device 32 stores and holds the anonymized data transmitted from the data preparation device 31 in the hard disk device 38, and the held anonymized data in response to a request from the data user 4 using the client terminal 40. The data user 4 is provided.

The client terminal 40 is also composed of a personal computer or the like provided with a CPU 41, a memory 42, a hard disk device 43, an input device 44, a monitor 45, and the like. The client terminal 40 accesses the data providing device 32 of the anonymized data providing system 30 via the second network 11 according to the operation of the data user 4 and downloads the anonymized data provided from the data providing device 32. And stored in the hard disk device 43.

FIG. 2 shows a logical configuration of the information processing system 1. In the present embodiment, healthcare data is assumed as the original data collected, managed, and provided by the data collection / management / provider 3, and the former data provider 2 is shown in FIG. The data management hospitals, pharmacies, clinics, health insurance associations, biobanks and / or households are assumed. However, since the health care data is originally an individual, the original data provider 2 may be assumed to be the individual itself and to collect the health care data directly from the individual.

In this embodiment, the data user 4 is not only a hospital, an insurance station, an administrative organization such as the Ministry of Health, Labor and Welfare, or an organization having a public role such as a health insurance association, but also a pharmaceutical company or a food company. It also assumes private companies such as beauty companies.

The data preparation device 31 stores the data collection / name identification processing unit 50, the anonymization processing unit 51, the statistical quantity strengthened anonymization data selection processing unit 52, and the hard disk device 35 (FIG. 1) stored in the memory 34 (FIG. 1). The stored original data database 53 and anonymization condition database 54 are provided.

The data collection / name identification processing unit 50 has a function of collecting original data (health care data) from the information processing apparatus 20 (FIG. 1) of the original data provider 2 and storing the collected original data in the original data database 53. It is a program. Further, the data collection / name identification processing unit 50 also executes a name identification process for collecting the original data as one original data when one original data exists across a plurality of original data providers 2.

The data preparation device 31 may collect data from the original data provider 2 every time the data is updated by the original data provider 2, or once a day for one day at night. Updates may be collected. The original data may be transmitted from the information processing apparatus 20 to the data preparation apparatus 31 or may be retrieved from the data preparation apparatus 31 to each information processing apparatus 20.

The anonymization processing unit 51 has a function of anonymizing personal information with respect to the original data stored in the original data database 53 while referring to the privacy protection condition table 56 stored in the anonymization condition database 54. It is. The anonymization processing unit 51 generates a single data set (hereinafter referred to as a pre-anonymization data set) 55 by collecting a plurality of related original data such as diabetes patient original data or hypertension patient original data, for example. Then, an anonymized data set 64 is generated by performing k-anonymization processing on the generated pre-anonymized data set 55. Then, the anonymization processing unit 51 transmits the data of the anonymization data set 64 generated in this way to the data providing device 32.

In response to a request from the data user 4, the statistic-enhanced anonymized data selection processing unit 52 sets the k value and anonymization process parameters for the pre-anonymization data set 55 specified by the data user 4. This is a program having a function of causing the anonymization processing unit 51 to execute a plurality of k-anonymization processes while changing. In addition, the statistic-enhanced anonymized data selection processing unit 52 selects the attribute (birth date, hospitalization date, discharge date, etc.) designated by the data user 4 from the plurality of anonymized data sets 64 thus obtained. The statistic (average, variance, correlation coefficient, etc.) specified by the data user 4 is within the allowable significance level specified by the data user 4 (hereinafter referred to as the allowable significance level). Of the anonymized data set 64, the anonymized data set having the highest safety (for example, the anonymized data set having the largest k value) is provided to the data providing device 32. Details of the statistic strengthening anonymized data selection processing unit 52 will be described later.

The original data database 53 is a database used for holding and managing the original data, and the original data collected by the data preparation device 31 from each original data provider 2 is sequentially registered. As described above, in the present embodiment, healthcare data is assumed as the original data. Therefore, as shown in FIG. 3, the individual original data includes the patient ID, the patient name, the patient birth date, and the hospitalization. The attribute information of the person such as date and discharge date, the test value of the person, and the like are included. In FIG. 3, one row corresponds to one person's original data, and the above-described pre-anonymization data set 55 is shown in FIG. 3 as a whole.

In the anonymization condition database 54, a privacy protection condition table 56 is stored. The privacy protection condition table 56 is used to manage conditions for protecting the privacy of the individual who is the owner of the personal information included in the original data, which is set in advance by the data collection / management / provider 3. As shown in FIG. 4, the table includes a personal information range column 56A, a deletion item column 56B, a change item column 56C, a protection item column 56D, and a k value minimum value column 56E.

In the personal information range column 56A, the data preparation device 31 handles personal information such as ID, name, address, and telephone number among the information included in the original data collected from the information processing device 20 of the original data provider 2. All the names of individual data items (hereinafter referred to as item names) are stored.

Further, in the deletion item column 56B, among the data items that should be handled as such personal information, each data that is not provided to the data user 4 in order to protect personal privacy, that is, deleted in the k-anonymization process. Stores the item name of an item (hereinafter referred to as a deletion item).

Further, the change item column 56C stores the item names of the respective data items (hereinafter referred to as change items) that can be provided to the data user 4 by changing the contents. In the example of FIG. 4, no change item is set. For example, if data can be provided in a state in which privacy is protected by converting an ID for identifying an individual into another ID, the ID is set as the change item.

Furthermore, the protection item column 56D stores the item name of a data item called a quasi-identifier (hereinafter referred to as a protection item) that can be the target of k-anonymization, and the k value minimum value column 56E stores anonymization. The minimum value of the k value in the k-anonymization process executed by the processing unit 51 (hereinafter referred to as the “k value minimum value”) is stored. This minimum k value is set in advance by the data collection / management / provider 3.

Note that each data item set as the personal information range is set as one of a deletion item, a conversion item, and a protection item. However, data items that are not set as personal information ranges may be set as deleted items, changed items, and / or protected items.

Also, the conditions for protecting personal privacy depend on laws and guidelines, so the definition may change depending on the country, and may change depending on the times. In such a case, a plurality of privacy protection condition tables 56 may be prepared. For example, when providing the anonymization data set 64 stored in the anonymization database 60 of the data providing device 32 to the data user 4 existing in a plurality of countries as described later, the data user 4 accesses The privacy protection condition table 56 corresponding to the country in question may be selected and the anonymized data set 64 to be provided may be changed.

On the other hand, the data providing device 32 includes an anonymization database 60, data catalog information 61 and usage condition information 62 stored in the hard disk device 38 (FIG. 1), and a data provision management unit 63 stored in the memory 37 (FIG. 1). And is configured.

The anonymization database 60 is a database used for holding and managing the anonymization data set 64 created by the anonymization processing unit 51 of the data preparation device 31, and stores a plurality of anonymization data sets 64. . An example of the data structure of the anonymized data set 64 is shown in FIG. FIG. 5 shows an example in which a deletion item such as a patient ID and a patient name is deleted, and k-anonymization processing (k = 2) is performed to anonymize a patient address and a patient age as protection items.

Further, the data catalog information 61 is information representing an outline of each anonymized data set 64 that can be provided to the data user 4 by the data providing device 32, and has, for example, a table configuration as shown in FIG. In the data catalog shown in FIG. 6, one record (row) corresponds to one anonymized data set 64, and these records are a data set ID column 61A, an anonymized data item column 61B, and a general data item column 61C, respectively. , K value column 61D, loss statistic column 61E, and the like.

In the data set ID column 61A, an identifier (data set ID) unique to the anonymized data set 64 assigned to the corresponding anonymized data set 64 is stored, and in the anonymized data item column 61B, The item names of all data items that are anonymized in the corresponding anonymized data set 64 are stored.

The general data item column 61C stores the item names of all data items that are not anonymized in the corresponding anonymized data set 64, respectively, and the k value column 61D stores k in the corresponding anonymized data set 64. The value value is stored. Further, the loss statistics column 61E stores the loss information amount (I.L) in the corresponding anonymized data set 64 and various statistics such as the average, variance, and correlation function of the anonymized data set 64.

The usage condition information 62 includes a user requirement table 65. In the user requirement table 65, the requirements set by the data user 4 for any anonymized data set 64 set by the data user 4 using the statistics strengthening item designation screen 80 described later with reference to FIG. (Hereinafter referred to as user requirements) is registered. Specifically, for the anonymized data set 64 specified by the data user 4, the attribute specified by the data user 4 (the date of birth or hospital discharge date, etc., hereinafter referred to as a target attribute) A user that a statistic specified by the data user 4 (mean, variance, correlation coefficient, etc., hereinafter referred to as a target statistic) should satisfy the allowable significance level specified by the data user 4 Set as a requirement.

The data provision management unit 63 is a program having a function of providing the data user 4 with the anonymized data set 64 designated by the data user 4 in response to a request from the data user 4. In practice, in response to a request from the data user 4, the data provision management unit 63 causes the data user 4 to display a data set selection screen 70 described later with reference to FIG. . Then, when the anonymized data set 64 desired to be purchased is designated by the data user 4 using the data set selection screen 70, the data provision management unit 63 transfers the data of the anonymized data set 64 from the anonymized database 60. It is read out and provided to the data user 4.

(2) Configuration of Various Screens FIG. 7 shows the configuration of the data set selection screen 70 displayed on the client terminal 40 (FIG. 1) based on the screen data transmitted from the data providing device 32. The data set selection screen 70 is a screen for selecting the anonymized data set 64 to be purchased when the data user 4 purchases the anonymized data set 64 from the data collection / management / provider 3.

The data set selection screen 70 includes an anonymized data set list 71 in which all anonymized data sets 64 that can be provided by the data providing device 32 are posted, a purchase button 72, a next button 73, and a cancel button 74. Composed.

In the anonymized data set list 71, one record (row) corresponds to one anonymized data set 64, and these records are respectively a check box column 71A, a data set ID column 71B, an anonymized data item column 71C, A general data item column 71D, a k value column 71E, a loss statistic column 71F, and the like are included.

Then, a check box 71AX is displayed in the check box column 71A of each record. Further, in the data set ID column 71B, the anonymized data item column 71C, the general data item column 71D, the k value column 71E, the loss statistic column 71F, etc., the corresponding data set ID of the data catalog information 61 described above with reference to FIG. The same information as that stored in the column 61A, the anonymized data item column 61B, the general data item column 61C, the k value column 61D, the loss statistic column 61E, or the like is stored.

On the data set selection screen 70, the anonymized data set 64 is clicked by clicking the check box 71AX corresponding to the desired anonymized data set 64 from the anonymized data sets 64 posted in the anonymized data set list 71. Can be selected for purchase. In this case, a check mark 71AY is displayed in the check box 71AX.

In the data set selection screen 70, the anonymized data set 64 can be purchased by clicking the purchase button 72 after selecting the desired anonymized data set 64 as described above. In this case, the data of the anonymized data set 64 purchased by the data user 4 is transmitted from the data providing device 32 to the client terminal 40 of the data user 4 (FIG. 1). The data set selection screen 70 can be closed by clicking the cancel button 74.

On the other hand, on the data set selection screen 70, the anonymized data set 64 to be purchased is selected as described above, and the next button 73 is clicked to open the statistics strengthening item designation screen 80 shown in FIG. be able to. The statistic strengthening item designation screen 80 is a screen for setting user requirements desired by the data user 4 for the anonymized data set 64 selected on the data set selection screen 70.

The statistics strengthening item designation screen 80 has a target data set ID display field 80A, and the data set ID of the anonymized data set 64 selected by the data user 4 on the data set selection screen 70 (FIG. 7). Is displayed in the target data set ID display field 80A.

The lower part of the target data set ID display column 80A includes target attributes that the data user 4 considers important (that is, the anonymized data set 64 that the data user 4 purchases from the data collection / management / provider 3). Are provided with a plurality of text boxes (hereinafter referred to as target attribute designating text boxes) 80B for designating the data items that are desired to be used in the information to be used. A text box for designating a target statistic (hereinafter referred to as a target statistic designation text box) 80C and a text box for designating an allowable significance level (hereinafter referred to as this) corresponding to each of the boxes 80B. 80D) (referred to as an allowable significance level designation text box).

Thus, the data user 4 inputs the attribute name of the target attribute (the item name of the data item) in the target attribute specification text box 80B of the statistics strengthening item specification screen 80 and is associated with the target attribute specification text box 80B. The target data set ID is entered by inputting the desired statistic and the allowable significance level into the target statistic specification text box 80C and the allowable significance level specification text box 80D (displayed at the lower side in the present embodiment). Desired user requirements for the anonymized data set 64 in which the data set ID is stored in the display field 80A can be set. For example, the example of FIG. 8 shows a state in which “average” should be set to satisfy the significance level of “5%” for the target attribute (data item) “birth date”.

Also, the statistics enhancement item designation screen 80 is provided with two toggle switches 80EX and 80EY for selecting whether the k value is variable or fixed during the k-anonymization process. One of these two toggle switches 80EX and 80EY is associated with a setting that makes the k value variable, and the other is associated with a setting that fixes the k value, and of these two toggle switches 80EX and 80EY, By clicking the desired toggle switches 80EX and 80EY, the corresponding setting (variable or fixed) can be selected.

Further, the statistic strengthening item designation screen 80 is provided with a text box (hereinafter referred to as a k value maximum value designation text box) 80F for designating the maximum k value. Thus, the data user 4 determines the maximum value of the k value in the k-anonymization process that is repeatedly executed while sequentially increasing the k value by 1 in the statistics-enhanced anonymized data selection process described later with reference to FIG. It can be set by inputting in the designated text box 80F.

Furthermore, an OK button 80G and a cancel button 80H are displayed in the lower part of the statistics strengthening item designation screen 80. On the statistics strengthening item designation screen 80, the statistics strengthening item designation screen 80 can be closed by clicking the cancel button 80H, and the desired user for the desired anonymized data set 64 as described above. Clicking the OK button 80G after setting the requirements causes the anonymized data providing system 30 (FIG. 2) to create an anonymized data set 64 according to the contents set on the statistics strengthening item designation screen 80. be able to.

(3) Flow of processing relating to provision of anonymized data set FIG. 9 shows that the data user 4 sets desired user requirements for the desired anonymized data set 64 in the information processing system 1, and the user requirements An anonymized data set 64 that satisfies the above condition is created by the anonymized data providing system 30, and a flow of a series of processes until the data user 4 purchases the created anonymized data set 64 is shown.

In this series of processing, the data user 4 accesses the data providing device 32 of the anonymized data providing system 30 using his / her client terminal 40 (FIG. 1), and data based on the data catalog information 61 (FIG. 2). It starts by requesting the presentation of a catalog (SP1).

Upon receiving such a request, the data providing apparatus 32 transmits the screen data of the data set selection screen 70 described above with reference to FIG. 7 to the client terminal 40 of the data user 4 so that the data set selection screen 70 is displayed on the client. It is displayed on the terminal 40 (SP2).

When the client terminal 40 selects the anonymized data set 64 desired by the data user 4 on the data set selection screen 70 and clicks the next button 73 (FIG. 7), the anonymized data set 64 selected at that time is selected. Is notified from the client terminal 40 to the data providing device 32 (SP3).

Upon receiving such notification, the data providing device 32 transmits the screen data of the statistics enhancement item designation screen 80 described above with reference to FIG. 8 to the client terminal 40 that is the transmission source of the notification, thereby the statistics enhancement item designation screen. 80 is displayed on the client terminal 40 (SP4).

Then, after the data user 4 sets the target attribute, the target statistic, the allowable significance level, the degree of freedom of the k value, the maximum value of the k value, and the like on the statistics strengthening item designation screen 80, the client terminal 40 determines the OK button When (FIG. 8) is clicked, these setting contents are transmitted to the data providing apparatus 32 as user requirement information (SP5).

When receiving the user requirement information, the data providing device 32 updates the user requirement table 65 of the usage condition information 62 (FIG. 2) based on the received user requirement information (SP6). The data providing device 32 then instructs the data preparation device 31 to create an anonymized data set 64 that satisfies the user requirements requested by the data user 4.

Thus, when such an instruction is given from the data providing device 32, the data preparation device 31 makes a transfer request for the original data necessary for creating the anonymized data set 64 that satisfies the user requirements set by the data user 4. Then, the data is transmitted to the information processing apparatus 20 (FIG. 2) of the corresponding original data provider 2 (SP7). Then, when the necessary original data is transferred in response to the transfer request (SP8), the data preparation device 31 generates the pre-anonymization data set 55 (FIG. 2) based on the transferred original data, The generated pre-anonymization data set 55 is stored in the original data database 53 (SP9).

Subsequently, the data preparation device 31 sequentially changes the value of the k value or the parameter at the time of the k-anonymization process, and the k-anonymization process for the pre-anonymization data set 55 stored in the original data database 53 in step SP9. To create a plurality of anonymized data sets 64 (SP10).

Then, the data preparation device 31 determines whether or not the plurality of anonymized data sets 64 created in step SP10 satisfy the user requirement set by the data user 4 in step SP5 (that is, the value of the target statistic of the target attribute). Are determined whether or not the acceptable significance level set by the data user 4 in step SP5 is satisfied (SP11).

And the data preparation apparatus 31 retries the process of step SP10 and step SP11, when none of the anonymized data sets 64 created in step SP10 satisfy the user requirements (SP12). In addition, when any one of the anonymized data sets 64 created in step SP10 satisfies the user requirement, the data preparation device 31 selects the anonymized data set 64 having the largest k value among the data providing devices 32. Send to.

Thus, the data providing device 32 stores the anonymized data set 64 given from the data preparation device 31 in the anonymized database 60 and updates the data catalog information 61 so as to register the anonymized data set 64 in the data catalog. (SP13). Further, the data providing device 32 thereafter transmits the screen data of the data set selection screen 70 (FIG. 7) on which the updated data catalog information 61 is posted to the client terminal 40 of the data user 4, thereby the screen data. Is displayed on the client terminal 40 (SP14).

When the client terminal 40 selects the anonymized data set 64 desired by the data user 4 on the data set selection screen 70 and clicks the purchase button 72 (FIG. 7), the client terminal 40 notifies the data providing apparatus 32 to that effect. (SP15).

Thus, when receiving the notification, the data providing device 32 reads the data of the anonymized data set 64 selected by the data user 4 on the data set selection screen 70 from the anonymized database 60, and the client terminal 40 of the data user 4 (SP16).

(4) Statistics-enhanced anonymized data selection process (4-1) Statistics-enhanced anonymized data set creation process FIG. The process sequence of the statistics reinforcement | strengthening anonymization data selection process performed by the statistics reinforcement | strengthening anonymization data selection process part 52 (FIG. 2) is shown.

The statistic strengthening anonymized data selection processing unit 52 starts the statistic strengthening anonymized data selection process in step SP10. First, the statistics strengthening item displayed on the client terminal 40 by the data providing device 32 in step SP4. The degree of freedom of the k value set by the data user 4 is acquired from the data providing device 32 using the designation screen 80 (SP20), and it is determined whether or not the degree of freedom of the acquired k value is “variable” (SP20). SP21).

If the statistic strengthening anonymized data selection processing unit 52 obtains a positive result in this determination, it executes a k value variable statistic enhancing process that repeatedly executes the k-anonymization process while sequentially increasing the k value by one ( SP22), and thereafter, the statistics-strengthening anonymized data selection process is terminated.

On the other hand, if the statistic-enhanced anonymized data selection processing unit 52 obtains a negative result in this determination, the k-anonymity is changed while changing the parameter at the k-anonymization process without changing the value of the k value. The k-value fixed statistic enhancement process for repeatedly executing the quantification process is executed (SP23), and then the statistic reinforcement anonymized data selection process is terminated.

(4-2) k Value Variable Statistics Enhancing Processing FIG. 11 shows the k value executed by the statistics enhancing anonymized data selection processing unit 52 in step SP22 of the statistics enhancing anonymized data selection processing described above with reference to FIG. The specific processing content of the variable statistic enhancement processing is shown.

When the statistical strength strengthening anonymized data selection processing unit 52 proceeds to step SP22 of the statistical strength strengthening anonymized data selection processing, the statistical value strengthening anonymization data selection processing unit 52 starts the k value variable statistical strength strengthening processing shown in FIG. The minimum value Kmin of the preset k value is acquired from (FIG. 4) (SP30).

Subsequently, the statistic-enhanced anonymized data selection processing unit 52 acquires the maximum value Kmax of the k value set by the data user 4 from the data providing device 32 (SP31), and further preliminarily stores the anonymization condition database 54 (FIG. The upper limit value Nmax of the number of trials of k-anonymization processing set in 2) is acquired via the anonymization processing unit 51 (SP32).

Subsequently, the statistic strengthening anonymization data selection processing unit 52 sets the value of the k value in the k-anonymization processing to the anonymization processing unit 51 so as to be the minimum value Kmin acquired in step SP30, and k-anonymization. The value of the variable n for counting the number of trials of the digitizing process is set to “1” (SP33).

Thereafter, the statistic strengthening anonymized data selection processing unit 52 performs the anonymization process so as to execute the k-anonymization process on the pre-anonymization data set 55 that is the basis of the anonymization data set 64 specified by the data user 4. An instruction is given to the unit 51 (SP34). Thus, in response to this instruction, the anonymization processing unit 51 executes k-anonymization processing for the corresponding pre-anonymization data set 55.

Then, the statistic enhancement anonymized data selection processing unit 52 temporarily stores the data of the anonymized data set 64 created by the k-anonymization process executed at this time in the hard disk device 35 (FIG. 1) (SP35). ).

Next, the statistic strengthening anonymized data selection processing unit 52 determines whether or not the value of k set in the anonymization processing unit 51 is less than the maximum value Kmax acquired in step SP31, and the value of the variable n is step. It is sequentially determined whether or not the number of trials acquired at SP32 is less than the upper limit value Nmax (SP36, SP37).

If both the determination of step SP36 and step SP37 obtain a positive result, the statistic-enhanced anonymized data selection processing unit 52 increases the value of k and the value of variable n by 1 (SP38), Returning to SP34, thereafter, the processing from step SP34 to step SP38 is repeated until a negative result is obtained in step SP36 or step SP37.

By repeating the above steps SP34 to SP38, the k value is sequentially changed from the minimum value Kmin to the maximum value Kmax in a range where the number of trials of k-anonymization processing (value of the variable n) does not exceed the upper limit value Nmax. The anonymized data set 64 obtained by executing k-anonymization processing on the pre-anonymized data set 55 (FIG. 2) that is the basis of the anonymized data set 64 designated by the data user 4 Each is stored in the hard disk device 35.

Then, the statistic strengthening anonymization data selection processing unit 52 eventually reaches the k value of the k-anonymization process reaching the maximum value Kmax set by the data user 4, or the number of trials of the k-anonymization process (variable n If a negative result is obtained at step SP36 or step SP37 when the value reaches the upper limit value Nmax, the data use of the pre-anonymization data set 55 that becomes the basis of the anonymization data set 64 designated by the data user 4 The target statistics of the anonymization data set 64 obtained by executing the k-anonymization process with the value of the target statistic specified by the data user 4 of the target attribute specified by the user 4 and the current k value The difference from the quantity value is calculated (SP39).

Subsequently, the statistic-enhanced anonymized data selection processing unit 52 determines whether or not the difference calculated in step SP39 satisfies the allowable significance level set by the data user 4 (SP40). If the statistic strengthening anonymized data selection processing unit 52 obtains a negative result in this determination, it determines whether or not the k value at that time is the minimum value Kmin (SP41).

If the statistic strengthening anonymized data selection processing unit 52 obtains a negative result in this determination, it decreases the value of k by 1 (SP42). Then, the statistic strengthening anonymized data selection processing unit 52 returns to step SP39, and thereafter repeats the processing of step SP39 to step SP42.

Then, when the statistic-enhanced anonymized data selection processing unit 52 eventually obtains an affirmative result at step SP40, the anonymized data set 64 obtained by executing the k-anonymization process with the target k value at that time is obtained. The data is output to the data providing device 32 as an anonymized data set 64 that satisfies the user requirements set by the data user 4 (SP44), and then the k-value variable statistic enhancement processing is terminated.

Therefore, in this case, the value of the statistic (target statistic) specified by the data user 4 of the attribute (target attribute) specified by the data user 4 indicates the allowable significance level specified by the data user 4. Of the anonymized data set 64 that satisfies, the data of the anonymized data set 64 having the largest k value (that is, the anonymized data set 64 having the highest safety) is output to the data providing device 32. In this case, the summary of the anonymized data set 64 is additionally displayed in the anonymized data set list 71 (FIG. 7) of the data set selection screen 70 described above with reference to FIG.

On the other hand, if the statistic strengthening anonymized data selection processing unit 52 obtains a negative result in the determination at step SP41, the value of the statistic specified by the data user 4 of the attribute specified by the data user 4 is obtained. The data providing apparatus 32 is notified of a warning that the allowable significance level set by the data user 4 is not satisfied (SP43).

Further, the statistic-enhanced anonymization data selection processing unit 52 performs the anonymization data obtained by executing the k-anonymization process with the target k value (the value of the k value in this case is the minimum value Kmin). The data of the set 64 is output to the data providing device 32 (SP44), and then the k-value variable statistic enhancement processing is terminated.

Therefore, in this case, the value of the statistic (target statistic) specified by the data user 4 of the attribute (target attribute) specified by the data user 4 indicates the allowable significance level specified by the data user 4. Among the unsatisfied anonymized data sets 64, the data of the anonymized data set 64 having the smallest k value (that is, the anonymized data set 64 considered to have the highest accuracy) is output to the data providing device 32. Become. In this case, the anonymized data set 64 is additionally displayed on the anonymized data set list 71 (FIG. 7) of the data set selection screen 70 described above with reference to FIG. Is displayed in association with the summary of the anonymized data set 64.

(4-3) k Value Fixed Statistics Enhancing Process FIGS. 12A and 12B are executed by the statistics enhancing anonymized data selection processing unit 52 in step SP23 of the statistics enhancing anonymized data selection process described above with reference to FIG. The specific processing content of the k-value fixed statistic enhancement processing is shown.

When the statistical strength strengthening anonymized data selection processing unit 52 proceeds to step SP23 of the statistical strength strengthening anonymized data selection processing, the statistical value strengthening anonymized data selection processing unit 52 starts the k-value fixed statistical strength strengthening processing shown in FIGS. 12A and 12B. The minimum value Kmin of the preset k value is acquired from the condition table 56 (SP50).

Subsequently, the statistic-enhanced anonymized data selection processing unit 52 acquires the maximum value Kmax of the k value set by the data user 4 from the data providing device 32 (SP51), and further preliminarily stores the anonymization condition database 54 (FIG. The upper limit value Nmax of the number of trials of k-anonymization processing stored in 2) is acquired via the anonymization processing unit 51 (SP52).

Next, the statistic strengthening anonymization data selection processing unit 52 generates the same number of different k-anonymization parameters as the upper limit value Nmax of the number of trials of k-anonymization processing (SP53).

This k-anonymization parameter is mainly composed of two combinations of k-anonymization target item and deletion record threshold. Among these, the k-anonymization target item is a parameter for designating a protection item (see FIG. 4) to be k-anonymization target. Protection items that are not targeted for k-anonymization are not provided to the data user 4 because they are deleted and subjected to k-anonymization processing. The target attribute (data item) specified on the statistics strengthening item specification screen 80 described above with reference to FIG. 8 is always included in the k-anonymization target item.

Further, the deletion record threshold is a parameter that specifies an upper limit value of the number of original data that may be deleted when the pre-anonymization data set 55 is k-anonymized in the k-anonymization process. For example, when the deletion record threshold is “0”, the k-anonymization process is executed until all the original data satisfy the k-anonymity. On the other hand, when the deletion record threshold is “1000”, k-anonymity is reduced when the number of original data that does not satisfy k-anonymity becomes “1000” or less in the course of k-anonymization processing. The k-anonymization process is terminated after deleting the unsatisfied original data.

Note that the upper limit value Nmax of the number of trials of k-anonymization processing is set to “8”, the protection items are “age”, “sex” and “address”, and the data items specified on the statistics enhancement item screen are An example of a k-anonymization parameter group in the case of “age” is shown in FIG.

Subsequently, the statistic-enhanced anonymized data selection processing unit 52 sets the value of the k value in the k-anonymization process in the anonymization processing unit 51 so as to be the minimum value Kmin acquired in step SP50, and k-anonymization. The value of the variable n for counting the number of times of the digitization process is set to “1” (SP54).

Thereafter, the statistic-enhanced anonymized data selection processing unit 52 selects one unselected k-anonymization parameter from the Nmax k-anonymization parameters generated in step SP53 (SP55), and selects it. An instruction is given to the anonymization processing unit 51 to execute the k-anonymization process using the k-anonymization parameter (SP56). Thus, in response to this instruction, the anonymization processing unit 51 executes the k-anonymization process for the corresponding pre-anonymization data set 55 (FIG. 2).

Then, the statistic enhancement anonymized data selection processing unit 52 temporarily stores the data of the anonymized data set 64 created by the k-anonymization process executed at this time in the hard disk device 35 (FIG. 1) (SP57). ).

Next, the statistic strengthening anonymized data selection processing unit 52 determines whether or not the value of k set in the anonymization processing unit 51 is less than the maximum value Kmax acquired in step SP51, and the value of the variable n is step. It is sequentially determined whether or not the number of trials acquired in SP52 is less than the upper limit value Nmax (SP58, SP59).

If both the determination of step SP58 and step SP59 obtain a positive result, the statistic-enhanced anonymized data selection processing unit 52 increases the value of the variable n by 1 (SP60), and then returns to step SP55. While the k-anonymization parameter selected in step SP55 is sequentially switched to another unprocessed k-anonymization parameter, the processes in steps SP55 to SP60 are repeated until a negative result is obtained in step SP58 or step SP59.

By repeating the above steps SP55 to SP60, k-anonymization processing is executed for all Nmax k-anonymization parameters generated in step SP53, and the anonymization data obtained by these k-anonymization processing is executed. Each set 64 is stored in the hard disk device 35.

Then, the statistic strengthening anonymization data selection processing unit 52 eventually reaches the k value of the k-anonymization process reaching the maximum value Kmax set by the data user 4, or the number of trials of the k-anonymization process (variable n If a negative result is obtained at step SP58 or step SP59 when the value of the data) reaches the upper limit value Nmax, the data use of the pre-anonymization data set 55 that becomes the basis of the anonymization data set 64 designated by the data user 4 The difference between the value of the target statistic designated by the person 4 and the value of the target statistic of each anonymized data set 64 obtained by the repetition processing of step SP55 to step SP60 is calculated (SP61).

Based on the calculation result of step SP61, the statistic-enhanced anonymized data selection processing unit 52 determines the value of the target statistic of each anonymized data set 64 obtained by the repetition processing of step SP55 to step SP60. The anonymized data set 64 having the smallest difference from the value of the target statistic specified by the data user 4 of the pre-anonymized data set 55 that is the source of the anonymized data set 64 specified by the data user 4 is identified. (SP62).

Subsequently, the statistic-enhanced anonymized data selection processing unit 52 selects the target statistic value designated by the data user 4 of the target attribute designated by the data user 4 of the anonymized data set 64 identified at step SP62. Then, it is determined whether or not the difference between the value of the target statistic specified by the data user 4 in the pre-anonymization data set 55 satisfies the allowable significance level set by the data user 4 (SP63).

And if the statistics reinforcement | strengthening anonymization data selection process part 52 obtains a positive result by this judgment, the anonymization data set which satisfy | fills the user requirement which the data user 4 set the anonymization data set 64 identified by step SP62. As 64, the data is output to the data providing device 32 (SP69), and then the k-value fixed statistics enhancing process is terminated.

Therefore, in this case, the value of the statistic (target statistic) specified by the data user 4 of the attribute (target attribute) specified by the data user 4 indicates the allowable significance level specified by the data user 4. Among the anonymized data set 64 that satisfies, the data of the anonymized data set 64 that has the closest target statistic value of the target attribute to the original statistical value of the target statistical value 55 of the original anonymization data is output to the data providing device 32. Will be. In this case, the summary of the anonymized data set 64 is additionally displayed in the anonymized data set list 71 (FIG. 7) of the data set selection screen 70 described above with reference to FIG.

On the other hand, if the statistic strengthening anonymization data selection processing unit 52 obtains a negative result in the determination at step SP63, the k-value at the time of the k-anonymization processing set in the anonymization processing unit 51 is It is determined whether or not the k value is the maximum value Kmax acquired in step SP51 (SP64).

Then, when the statistic strengthening anonymized data selection processing unit 52 obtains a positive result in this determination, the statistic (target statistic) designated by the data user 4 of the attribute (target attribute) designated by the data user 4 Is notified to the data providing device 32 that the value of does not satisfy the allowable significance level set by the data user 4 (SP65).

Further, the statistic strengthening anonymization data selection processing unit 52 performs the anonymization specified in step SP62 out of the anonymization data set 64 obtained by executing the k-anonymization process with the k value set at that time. The data of the data set 64 is output to the data providing device 32 (SP69), and then the k-value fixed statistics enhancing process is terminated.

Therefore, in this case, the value of the statistic (target statistic) specified by the data user 4 of the attribute (target attribute) specified by the data user 4 indicates the allowable significance level specified by the data user 4. Of the unsatisfied anonymized data set 64, the data of the anonymized data set 64 having the smallest difference in the target statistic value of the target attribute in the relationship with the original pre-anonymized data set 55 (FIG. 1) is the data providing device. 32 is output. In this case, the anonymized data set 64 is additionally displayed on the anonymized data set list 71 (FIG. 7) of the data set selection screen 70 described above with reference to FIG. Is displayed in association with the summary of the anonymized data set 64.

On the other hand, when the statistic strengthening anonymized data selection processing unit 52 obtains a negative result in the determination at step SP64, it confirms with the data user 4 whether or not to retry by incrementing the value of the k value by one. The data providing apparatus 32 is requested to display the confirmation screen (not shown) on the client terminal 40 (SP66).

Thus, at this time, the data providing device 32 transmits predetermined screen data to the corresponding client terminal 40, so that the target statistic of the target attribute satisfies the allowable significance level specified by the data user 4 at the current k value. The above-described confirmation screen on which a warning that the anonymized data set 64 cannot be created is displayed on the client terminal 40. Further, the data providing device 32 transfers an answer transmitted from the client terminal 40 as to whether or not to retry the data user 4 using the confirmation screen, to the data preparation device 31.

Then, the statistic strengthening anonymized data selection processing unit 52 determines whether or not the data user has selected to retry based on the response transferred from the data providing device 32 (SP67). If the statistic strengthening anonymized data selection processing unit 52 obtains a positive result in this determination, it increments the value of k by 1 (SP68), and then returns to step SP55. Then, the statistic strengthening anonymized data selection processing unit 52 thereafter processes step SP55 and subsequent steps in the same manner as described above.

On the other hand, if the statistic strengthening anonymized data selection processing unit 52 obtains a negative result in the determination at step SP67, the statistics specified by the data user 4 of the attribute (target attribute) specified by the data user 4 A warning that the value of the quantity (target statistic) does not satisfy the allowable significance level set by the data user 4 is sent to the data providing apparatus 32 (SP65).

Therefore, in this case, the value of the statistic (target statistic) specified by the data user 4 of the attribute (target attribute) specified by the data user 4 indicates the allowable significance level specified by the data user 4. Of the unsatisfied anonymized data set 64, the data of the anonymized data set 64 having the smallest difference in the target statistic value of the target attribute in the relationship with the original pre-anonymized data set 55 (FIG. 1) is the data providing device. 32 is output. In this case, the summary of the anonymized data set 64 is additionally displayed in the anonymized data set list 71 (FIG. 7) of the data set selection screen 70 described above with reference to FIG.

(5) Effects of the present embodiment As described above, in the information processing system 1 of the present embodiment, a plurality of k-anonymizations are performed while sequentially changing the k value or the anonymization parameter for the pre-anonymization data set 55. In order to selectively provide the data user 4 with the anonymized data set 64 that satisfies the user requirements set in advance by the data user 4 from the anonymized data set 64 thus obtained, An anonymized data set 64 that meets the needs of the data user 4 can be provided to the data user 4.

(6) Other Embodiments In the above-described embodiment, the anonymized data providing system 30 of the data collection / management / provider 3 is configured by two devices, the data preparation device 31 and the data providing device 32. However, the present invention is not limited to this, and the functions of the data preparation device 31 and the data providing device 32 are mounted in one information processing device, and the anonymized data providing system 30 is provided as one information processing device. You may make it comprise by.

In the above-described embodiment, the case where the present invention is applied to the information processing system 1 that anonymizes healthcare data and provides the data user 4 is described. However, the present invention is not limited thereto. In addition, the present invention can be widely applied to various information processing systems other than anonymizing data other than healthcare data and providing it to the data user 4.

The present invention can be widely applied to information processing systems that provide data after anonymizing or obscuring information related to privacy.

1 ... Information processing system, 2 ... Original data provider, 3 ... Data collection / management / provider, 4 ... Data user, 20 ... Information processing device, 30 ... Anonymized data provision system, 31 Data preparation device 32

Data providing device

33, 36

CPU

34, 37

Memory

35, 38 Hard disk device 40 Client terminal 51 Anonymization processing unit 52 ...... Statistics strengthening anonymized data selection processing unit, 53 …… Original data database, 54 …… Anonymization condition database, 55 …… Pre-anonymization data set, 56 …… Privacy protection condition table, 60 …… Anonymization database 61 …… Data catalog, 62 …… Usage condition information, 63 …… Data provider, 64 …… Anonymized data set, 65 …… User requirement table, 70 …… Data Catalog selection screen, 80 ...... statistics strengthening items specified screen.

Claims

In the anonymized data providing device that anonymizes the original data and provides it to the data user,
An anonymization processing unit that executes the anonymization process on the data set of the original data;
An anonymization data selection processing unit for controlling the anonymization processing unit;
A data providing unit that manages the anonymized data set as an anonymized data set and provides the anonymized data set to the data user in response to a request from the data user;
The data user
Selecting the desired data set and setting the acceptable significance level of the desired statistic of the desired attribute for the data set as a user requirement;
The anonymized data selection processing unit
Controlling the anonymization processing unit to execute the anonymization processing a plurality of times for the data set selected by the data user;
For a plurality of anonymized data sets obtained by a plurality of anonymization processes, each of the statistics set by the data user is calculated,
The calculated statistic of each anonymized data set is compared with the statistic of the data set of the original data, and the difference in the statistic satisfies the allowable significance level set by the data user. Select the anonymized data set as the anonymized data set that satisfies the user requirements,
The data providing unit includes:
The anonymized data providing apparatus that provides the data user with the anonymized data set selected by the anonymized data selection processing unit.
The data providing unit includes:
Presenting the data user with a screen for setting the attribute, the statistic and the allowable significance level desired by the data user for the data set selected by the data user;
The said anonymized data selection part is notified to the said attribute about the said data set set by the said data user using the said screen, the said statistic, and the said allowable significance level. The Claim 1 characterized by the above-mentioned. Anonymized data providing device.
The anonymization processor
Executing k-anonymization processing as the anonymization processing,
The anonymized data selection processing unit
2. The anonymization process according to claim 1, wherein the anonymization processing unit is controlled to execute the k-anonymization process for the data set a plurality of times while sequentially changing k values in the k-anonymization process. Data provision device.
The anonymized data selection processing unit
Selecting the anonymized data set that satisfies the user requirements and has the largest k value value from the plurality of anonymized data sets obtained by the k-anonymization process multiple times. The anonymized data providing apparatus according to claim 3, wherein the apparatus is anonymized data providing apparatus.
The anonymization processor
k-anonymize the data set of the original data by anonymization processing;
The anonymized data selection processing unit
The anonymization processing unit according to claim 1, wherein the anonymization processing unit is controlled to execute the k-anonymization processing for the data set a plurality of times while sequentially changing parameters at the time of the k-anonymization processing. Data provision device.
The anonymized data selection processing unit
Among the plurality of anonymized data sets obtained by the k-anonymization process a plurality of times, the statistic that satisfies the user requirements and is set by the data user The anonymized data providing apparatus according to claim 5, wherein the anonymized data set having the smallest difference from the statistics of the data set is selected.
In the anonymized data providing method executed in the anonymized data providing apparatus that anonymizes the original data and provides it to the data user,
The anonymized data providing device is
An anonymization processing unit that executes the anonymization process on the data set of the original data;
An anonymization data selection processing unit for controlling the anonymization processing unit;
A data providing unit that manages the anonymized data set as an anonymized data set, and provides the anonymized data set to the data user in response to a request from the data user;
The data user
Selecting the desired data set and setting the acceptable significance level of the desired statistic of the desired attribute for the data set as a user requirement;
A first step of controlling the anonymization processing unit so that the anonymization data selection processing unit executes the anonymization processing a plurality of times for the data set selected by the data user;
A second step in which the anonymized data selection processing unit calculates the statistics set by the data user for each of the plurality of anonymized data sets obtained by the anonymization process a plurality of times; ,
The anonymized data selection processing unit compares the calculated statistic of each anonymized data set with the statistic of the data set of the original data, and the difference in the statistic is the data user. Selecting the anonymized data set that satisfies the allowable significance level set by the user as the anonymized data set that satisfies the user requirements;
The anonymized data providing method, comprising: a fourth step in which the data providing unit provides the data user with the anonymized data set selected by the anonymized data selection processing unit.
The data providing unit includes:
Presenting the data user with a screen for setting the attribute, the statistic and the allowable significance level desired by the data user for the data set selected by the data user;
The said anonymization data selection part is notified to the said attribute about the said data set set by the said data user using the said screen, the said statistic, and the said allowable significance level. Anonymized data provision method.
The anonymization processor
Executing k-anonymization processing as the anonymization processing,
In the first step, the anonymized data selection processing unit includes:
The anonymization processing unit according to claim 7, wherein the anonymization processing unit is controlled to execute the k-anonymization processing for the data set a plurality of times while sequentially changing k values in the k-anonymization processing. Data provision method.
In the third step, the anonymized data selection processing unit
Selecting the anonymized data set that satisfies the user requirements and has the largest k value value from the plurality of anonymized data sets obtained by the k-anonymization process multiple times. The anonymized data providing method according to claim 9, wherein the anonymized data is provided.
The anonymization processor
k-anonymize the data set of the original data by anonymization processing;
In the first step, the anonymized data selection processing unit includes:
The anonymization processing unit according to claim 7, wherein the anonymization processing unit is controlled to execute the k-anonymization processing for the data set a plurality of times while sequentially changing parameters at the time of the k-anonymization processing. Data provision method.
In the third step, the anonymized data selection processing unit
Among the plurality of anonymized data sets obtained by the k-anonymization process a plurality of times, the statistic that satisfies the user requirements and is set by the data user The anonymized data providing method according to claim 11, wherein the anonymized data set having the smallest difference from the statistics of the data set is selected.