[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN108416227A - Big data platform secret protection evaluation method and device based on Dare Information Entropy - Google Patents

Big data platform secret protection evaluation method and device based on Dare Information Entropy Download PDF

Info

Publication number
CN108416227A
CN108416227A CN201810171758.XA CN201810171758A CN108416227A CN 108416227 A CN108416227 A CN 108416227A CN 201810171758 A CN201810171758 A CN 201810171758A CN 108416227 A CN108416227 A CN 108416227A
Authority
CN
China
Prior art keywords
evaluation
secret protection
data
privacy protection
evaluation index
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810171758.XA
Other languages
Chinese (zh)
Inventor
史玉良
张世栋
李庆忠
陈玉
臧淑娟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong University
Original Assignee
Shandong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong University filed Critical Shandong University
Priority to CN201810171758.XA priority Critical patent/CN108416227A/en
Publication of CN108416227A publication Critical patent/CN108416227A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Databases & Information Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Medical Informatics (AREA)
  • Storage Device Security (AREA)

Abstract

本发明公开了一种基于德尔熵值法的大数据平台隐私保护评价方法及装置,构建基于概率统计的隐私保护层次分析模型,求解隐私保护数据的隐私保护评价指标;根据步骤1中得到的隐私保护评价指标,建立基于德尔熵值法的隐私保护模糊综合评价模型;)获取隐私保护模糊综合评价模型的综合评价得分,并根据得分和等级标准获取最终的评价结果。本发明建立了隐私保护模糊综合评价模型和隐私保护层次分析模型,通过两阶段的回溯层次分析得到隐私保护后的数据分布总体的评价指标值和评价参数值,根据评价指标和评价参数值计算出单因素评价向量,降低人为因素的干扰,提高了评价指标权重分配的准确性和公平性,实现了对分块混淆隐私保护方法的可视化评价。

The invention discloses a privacy protection evaluation method and device of a big data platform based on the Del entropy method, constructs a privacy protection hierarchical analysis model based on probability statistics, and solves the privacy protection evaluation index of privacy protection data; according to the privacy protection obtained in step 1 To protect the evaluation index, establish a privacy protection fuzzy comprehensive evaluation model based on the Del entropy method;) Obtain the comprehensive evaluation score of the privacy protection fuzzy comprehensive evaluation model, and obtain the final evaluation result according to the score and grade standard. The present invention establishes a privacy protection fuzzy comprehensive evaluation model and a privacy protection hierarchical analysis model, obtains the overall evaluation index value and evaluation parameter value of the data distribution after privacy protection through two-stage backtracking hierarchical analysis, and calculates the value according to the evaluation index and evaluation parameter value The single-factor evaluation vector reduces the interference of human factors, improves the accuracy and fairness of the weight distribution of evaluation indicators, and realizes the visual evaluation of the privacy protection method for block obfuscation.

Description

基于德尔熵值法的大数据平台隐私保护评价方法及装置Big data platform privacy protection evaluation method and device based on Del entropy method

技术领域technical field

本发明涉及数据安全技术领域,具体涉及一种基于德尔熵值法的大数据平台隐私保护评价方法及装置。The invention relates to the technical field of data security, in particular to a privacy protection evaluation method and device for a big data platform based on the Del entropy method.

背景技术Background technique

随着信息技术和社会的不断发展,以及移动互联网、物联网、云计算应用的进一步丰富,数据已呈指数级增长,大数据时代悄然到来。近几年,大数据技术越来越成熟,大数据应用也越来越多,人们在关注大数据的储存、处理和迁移的同时,也逐渐开始重视大数据的隐私保护问题。With the continuous development of information technology and society, as well as the further enrichment of mobile Internet, Internet of Things, and cloud computing applications, data has grown exponentially, and the era of big data has quietly arrived. In recent years, big data technology has become more and more mature, and there are more and more big data applications. While people pay attention to the storage, processing and migration of big data, they also gradually begin to pay attention to the privacy protection of big data.

数据隐私保护一直是个热点研究问题,学者们已经提出了很多数据隐私保护方法。很多隐私保护方法虽然可以在一定程度上保护数据的隐私,但是对于大数据的隐私保护,它们还存在着一些缺陷:加密方式重构代价大,密文处理困难;差分保护,去除噪音困难,数据容易失真;混淆方法,数据处理效率不高,重构速度慢;k匿名,容易导致部分数据缺失,且无法重构。Data privacy protection has always been a hot research issue, and scholars have proposed many data privacy protection methods. Although many privacy protection methods can protect the privacy of data to a certain extent, they still have some shortcomings for the privacy protection of big data: the reconstruction cost of encryption method is high, and it is difficult to process ciphertext; differential protection is difficult to remove noise, and data Easy to be distorted; obfuscation method, data processing efficiency is not high, and reconstruction speed is slow; k anonymity, it is easy to cause part of the data to be missing, and cannot be reconstructed.

为了能够提高大数据的隐私效果,一种基于分块混淆的隐私保护方法被提出,该方法首先通过构造三方(可信第三方、云服务提供商和租户)可信交互模型,然后将数据分成多个数据块,并混淆数据块和数据分片之间关系,最后将数据关系存储到可信第三方以便快速重构数据。分块混淆的方法不仅可以实现在明文状态下保护用户的数据隐私,而且还可以保证数据的完整性,提高数据的处理效率。In order to improve the privacy effect of big data, a privacy protection method based on block obfuscation is proposed. This method first constructs a three-party (trusted third party, cloud service provider and tenant) trusted interaction model, and then divides the data into Multiple data blocks, and confuse the relationship between data blocks and data fragments, and finally store the data relationship to a trusted third party to quickly reconstruct the data. The block obfuscation method can not only protect the user's data privacy in the plaintext state, but also ensure the integrity of the data and improve the efficiency of data processing.

分块混淆的隐私保护方法虽然可以实现明文状态下保护大数据的隐私,但是隐私保护效果如何,还需要进一步的验证。目前,关于隐私保护评价方面的研究已经取得了很多成果,一些隐私评价方法也相继被提出,然而这些评价方法都不能很好地评价分块混淆隐私保护方法的隐私保护效果。Although the privacy protection method of block obfuscation can realize the protection of the privacy of big data in the plaintext state, the effect of privacy protection needs further verification. At present, the research on privacy protection evaluation has made a lot of achievements, and some privacy evaluation methods have been proposed one after another. However, these evaluation methods cannot evaluate the privacy protection effect of block obfuscation privacy protection method well.

发明内容Contents of the invention

为了克服上述现有技术的不足,本发明提供了一种基于德尔熵值法的大数据平台隐私保护评价方法及装置,实现对分块混淆技术的可视化隐私评价,建立了基于德尔熵值法的隐私保护模糊综合评价模型和基于概率统计的隐私保护层次分析模型,通过两阶段的回溯层次分析得到隐私保护后的数据分布总体的评价指标值和评价参数值,根据评价指标和评价参数值计算出单因素评价向量,降低人为因素的干扰,同时综合并改进了基于主观判断的德尔菲法和客观判断的熵值法,提高了评价指标权重分配的准确性和公平性,实现了对分块混淆隐私保护方法的可视化评价,为分块混淆隐私保护方法的进一步优化提供了理论基础和方向。In order to overcome the deficiencies of the above-mentioned prior art, the present invention provides a privacy protection evaluation method and device for a big data platform based on the Del entropy method, which realizes the visual privacy evaluation of the block obfuscation technology, and establishes a method based on the Del entropy method. The privacy protection fuzzy comprehensive evaluation model and the privacy protection hierarchy analysis model based on probability statistics, through the two-stage backtracking hierarchy analysis, the overall evaluation index value and evaluation parameter value of the data distribution after privacy protection are obtained, and calculated according to the evaluation index and evaluation parameter value The single-factor evaluation vector reduces the interference of human factors. At the same time, it integrates and improves the Delphi method based on subjective judgment and the entropy method based on objective judgment, which improves the accuracy and fairness of the weight distribution of evaluation indicators, and realizes the confusion of blocks. The visual evaluation of the privacy protection method provides a theoretical basis and direction for the further optimization of the block obfuscation privacy protection method.

本发明所采用的技术方案是:The technical scheme adopted in the present invention is:

一种基于德尔熵值法的大数据平台隐私保护评价方法,该方法包括以下步骤:A method for evaluating the privacy protection of big data platforms based on the Del entropy method, the method comprising the following steps:

(1)构建基于概率统计的的隐私保护层次分析模型,求解隐私保护数据的隐私保护评价指标;(1) Construct a privacy protection hierarchical analysis model based on probability and statistics, and solve the privacy protection evaluation index of privacy protection data;

(2)根据步骤1中得到的隐私保护评价指标,建立基于德尔熵值法的隐私保护模糊综合评价模型;(2) According to the privacy protection evaluation index obtained in step 1, establish a privacy protection fuzzy comprehensive evaluation model based on Del entropy method;

(3)获取隐私保护模糊综合评价模型的综合评价得分,并根据得分和等级标准获取最终的评价结果。(3) Obtain the comprehensive evaluation score of the privacy protection fuzzy comprehensive evaluation model, and obtain the final evaluation result according to the score and grade standard.

进一步的,所述基于概率统计的的隐私保护层次分析模型,包括:Further, said probabilistic-statistic-based privacy protection hierarchical analysis model includes:

(1-1)输入层,待评价隐私保护后的用户数据;(1-1) Input layer, user data to be evaluated for privacy protection;

(1-2)数据存储模式层,以待评价隐私保护后的用户数据为输入,根据用户需求的差异、SaaS应用需求以及云服务商的利益因素,将待评价隐私保护后的用户数据存储在不同的数据存储模式DSM中;(1-2) The data storage mode layer, which takes the user data to be evaluated for privacy protection as input, and stores the user data to be evaluated for privacy protection in the Different data storage modes in DSM;

(1-3)数据分块层,将每个数据存储模式中租户数据,分成不同的数据块;(1-3) The data block layer divides the tenant data in each data storage mode into different data blocks;

(1-4)用户层,存储在每个数据存储模式DSM和数据块中的用户数据;(1-4) user layer, user data stored in each data storage mode DSM and data block;

(1-5)隐私保护评价指标层,定义隐私保护评价指标及其计算公式;(1-5) The privacy protection evaluation index layer, which defines the privacy protection evaluation index and its calculation formula;

(1-6)回逆层,通过用户层向数据存储模式层、数据存储模式层向输入层的两阶段回逆,计算出隐私保护数据总的评价指标和评价参数值。(1-6) The inversion layer, through the two-stage inversion from the user layer to the data storage mode layer, and from the data storage mode layer to the input layer, calculates the total evaluation index and evaluation parameter value of the privacy protection data.

进一步的,所述通过用户层向数据存储模式层、数据存储模式层向输入层的两阶段回逆,计算出隐私保护数据总的评价指标和评价参数值,包括:Further, the two-stage inversion from the user layer to the data storage mode layer and from the data storage mode layer to the input layer is used to calculate the total evaluation index and evaluation parameter value of the privacy protection data, including:

根据隐私保护评价指标的定义及其计算公式,计算出每个用户在每个DSM下的隐私保护评价指标值,然后基于所有用户的隐私保护评价指标值,计算每个DSM的隐私保护评价指标和评价参数值,最后基于所有DSM的评价指标和评价参数值,计算出隐私保护数据总的隐私保护评价指标和评价参数值。According to the definition of the privacy protection evaluation index and its calculation formula, the privacy protection evaluation index value of each user under each DSM is calculated, and then based on the privacy protection evaluation index values of all users, the privacy protection evaluation index and the privacy protection evaluation index of each DSM are calculated. Evaluation parameter values. Finally, based on the evaluation indicators and evaluation parameter values of all DSMs, the total privacy protection evaluation indicators and evaluation parameter values of the privacy protection data are calculated.

进一步的,所述隐私保护评价指标包括隐私属性泄露概率、隐私属性泄露比率、隐私属性值熵和块偏离度;评价参数值包括隐私属性泄露概率、隐私属性泄露比率、隐私属性值熵和块偏离度的均值和方差。Further, the privacy protection evaluation index includes privacy attribute leakage probability, privacy attribute leakage ratio, privacy attribute value entropy and block deviation; evaluation parameter value includes privacy attribute leakage probability, privacy attribute leakage ratio, privacy attribute value entropy and block deviation mean and variance of degrees.

进一步的,所述建立基于德尔熵值法的隐私保护模糊综合评价模型,包括:Further, the establishment of a privacy protection fuzzy comprehensive evaluation model based on the Del entropy method includes:

(2-1)确定隐私保护评价对象的评价因素集和评语集,设定隐私保护评价级别划分的等级标准;(2-1) Determine the evaluation factor set and comment set of the privacy protection evaluation object, and set the classification standard for the classification of privacy protection evaluation levels;

(2-2)对每个评价指标进行单因素评价,得到各评价指标的评价向量,根据各评价指标的评价向量构建评价矩阵;(2-2) Carry out single factor evaluation to each evaluation index, obtain the evaluation vector of each evaluation index, construct evaluation matrix according to the evaluation vector of each evaluation index;

(2-3)采用基于德尔菲法和熵值法的综合权重确定法,得到各隐私保护评价指标权重;(2-3) Using the comprehensive weight determination method based on the Delphi method and the entropy method to obtain the weight of each privacy protection evaluation index;

(2-4)采用模糊矩阵复合运算,得到模糊综合评价得分和评价结果。(2-4) Obtain fuzzy comprehensive evaluation scores and evaluation results by using fuzzy matrix composite operations.

进一步的,所述隐私保护评价对象的评价因素集包括隐私属性泄露概率、隐私属性泄露比率、隐私属性值熵和块偏离度;所述评语集包括非常好、好、一般和差。Further, the evaluation factor set of the privacy protection evaluation object includes privacy attribute leakage probability, privacy attribute leakage ratio, privacy attribute value entropy and block deviation; the comment set includes very good, good, general and poor.

进一步的,所述采用基于德尔菲法和熵值法的综合权重确定法,得到各评价指标权重,包括:Further, the comprehensive weight determination method based on the Delphi method and the entropy value method is used to obtain the weight of each evaluation index, including:

从若干个评审专家中进行m次抽取,组成m个小组,每次抽取的人数一定但不同,得到m个组评价指标的打分;Select m times from a number of review experts to form m groups. The number of people drawn each time is fixed but different, and the scores of m group evaluation indicators are obtained;

利用m个组评价指标的打分,构建决策矩阵;Use the scoring of m group evaluation indicators to construct a decision matrix;

基于决策矩阵,计算各隐私保护评价指标下所有分组的总评价得分之和;Based on the decision matrix, calculate the sum of the total evaluation scores of all groups under each privacy protection evaluation index;

分别将各隐私保护评价指标下每组评分与所有分组的总评价得分做商,得到各隐私保护评价指标下每组评分的贡献值;The scores of each group under each privacy protection evaluation index are compared with the total evaluation scores of all groups to obtain the contribution value of each group score under each privacy protection evaluation index;

基于各隐私保护评价指标下每组评分的贡献值,计算各隐私保护评价指标下所有分组评分对指标值的熵值;Based on the contribution value of each group score under each privacy protection evaluation index, calculate the entropy value of all group scores to index values under each privacy protection evaluation index;

计算各隐私保护评价指标的重要程度,将各隐私保护评价指标的重要程度与总的重要程度相比,得到各隐私保护评价指标的权重。Calculate the importance of each privacy protection evaluation index, and compare the importance of each privacy protection evaluation index with the total importance to obtain the weight of each privacy protection evaluation index.

进一步的,所述采用模糊矩阵复合运算,得到模糊综合评价得分和评价结果,包括:Further, the fuzzy comprehensive evaluation score and evaluation results are obtained by using fuzzy matrix composite operations, including:

将各评价指标的评价矩阵和各评价指标的权重进行合成运算,得到评价向量;Combining the evaluation matrix of each evaluation index and the weight of each evaluation index to obtain the evaluation vector;

对评价向量进行归一化处理,采用加权平均运算,得到模糊综合评价得分,The evaluation vector is normalized, and the weighted average operation is used to obtain the fuzzy comprehensive evaluation score.

根据模糊综合评价得分和等级标准获取最终的评价结果。According to the fuzzy comprehensive evaluation score and grade standard, the final evaluation result is obtained.

一种计算机装置,用于隐私保护数据分析和评价,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现以下步骤,包括:A computer device for privacy protection data analysis and evaluation, comprising a memory, a processor, and a computer program stored on the memory and operable on the processor, and the processor implements the following steps when executing the program, including:

构建基于概率统计的的隐私保护层次分析模型,求解隐私保护数据的隐私保护评价指标;Construct a privacy protection hierarchical analysis model based on probability and statistics, and solve the privacy protection evaluation index of privacy protection data;

根据上述得到的隐私保护评价指标,建立基于德尔熵值法的隐私保护模糊综合评价模型;According to the privacy protection evaluation index obtained above, a fuzzy comprehensive evaluation model of privacy protection based on Del entropy method is established;

获取隐私保护模糊综合评价模型的综合评价得分,并根据得分和等级标准获取最终的评价结果。Obtain the comprehensive evaluation score of the privacy protection fuzzy comprehensive evaluation model, and obtain the final evaluation result according to the score and grade standard.

一种计算机可读存储介质,其上存储有用于隐私保护数据分析和评价的计算机程序,该程序被处理器执行时实现以下步骤:A computer-readable storage medium, on which a computer program for privacy protection data analysis and evaluation is stored, and when the program is executed by a processor, the following steps are implemented:

构建基于概率统计的的隐私保护层次分析模型,求解隐私保护数据的隐私保护评价指标;Construct a privacy protection hierarchical analysis model based on probability and statistics, and solve the privacy protection evaluation index of privacy protection data;

根据上述得到的隐私保护评价指标,建立基于德尔熵值法的隐私保护模糊综合评价模型;According to the privacy protection evaluation index obtained above, a fuzzy comprehensive evaluation model of privacy protection based on Del entropy method is established;

获取隐私保护模糊综合评价模型的综合评价得分,并根据得分和等级标准获取最终的评价结果。Obtain the comprehensive evaluation score of the privacy protection fuzzy comprehensive evaluation model, and obtain the final evaluation result according to the score and grade standard.

与现有技术相比,本发明的有益效果是:Compared with prior art, the beneficial effect of the present invention is:

(1)本发明建立了基于概率统计的隐私保护层次分析模型和基于德尔熵值法的隐私保护模糊综合评价模型,通过两阶段的回溯层次分析得到隐私保护后的数据分布总体的评价指标值和评价参数值,根据评价指标和评价参数值计算出单因素评价向量,降低人为因素的干扰;(1) The present invention has established the privacy protection hierarchical analysis model based on probability statistics and the privacy protection fuzzy comprehensive evaluation model based on Del entropy value method, obtains the overall evaluation index value and the overall data distribution after privacy protection through two-stage backtracking hierarchical analysis Evaluate the parameter value, calculate the single-factor evaluation vector according to the evaluation index and the evaluation parameter value, and reduce the interference of human factors;

(2)本发明综合并改进了基于主观判断的德尔菲法和客观判断的熵值法,提高了评价指标权重分配的准确性和公平性,实现了对分块混淆隐私保护方法的可视化评价;(2) The present invention synthesizes and improves the Delphi method based on subjective judgment and the entropy value method of objective judgment, improves the accuracy and fairness of the evaluation index weight distribution, and realizes the visual evaluation of the block confusion privacy protection method;

(3)本发明提出的隐私保护评价方法不仅可以客观地展示隐私保护评价效果,而且也证明了分块混淆隐私保护方法的有效性,为大数据应用支撑平台中的数据隐私保护提供了很好的理论支撑。(3) The privacy protection evaluation method proposed by the present invention can not only objectively show the evaluation effect of privacy protection, but also prove the effectiveness of the block obfuscation privacy protection method, which provides a good data privacy protection in the big data application support platform theoretical support.

附图说明Description of drawings

构成本申请的一部分的说明书附图用来提供对本申请的进一步理解,本申请的示意性实施例及其说明用于解释本申请,并不构成对本申请的不当限定。The accompanying drawings constituting a part of the present application are used to provide further understanding of the present application, and the schematic embodiments and descriptions of the present application are used to explain the present application, and do not constitute improper limitations to the present application.

图1是基于德尔熵值法的大数据平台隐私保护评价方法流程图;Figure 1 is a flow chart of a privacy protection evaluation method for a big data platform based on the Del entropy method;

图2是基于分块混淆的隐私保护架构图;Figure 2 is a privacy protection architecture diagram based on block obfuscation;

图3是基于概率统计的隐私保护层次分析模型示意图;Fig. 3 is a schematic diagram of a hierarchical analysis model of privacy protection based on probability statistics;

图4是三种算法的隐私属性泄露概率分布情况;Figure 4 shows the distribution of the privacy attribute leakage probability of the three algorithms;

图5是三种算法下隐私属性泄露比率分布情况;Figure 5 shows the distribution of privacy attribute leakage ratios under the three algorithms;

图6是三种算法下隐私属性值熵的变化情况;Figure 6 shows the change of privacy attribute value entropy under the three algorithms;

图7是三种算法下块偏离度的分布情况;Figure 7 shows the distribution of block deviation degrees under the three algorithms;

图8a是德尔熵值法的权重准确性对比分析;Figure 8a is a comparative analysis of the weight accuracy of the Del entropy method;

图8b是实际测量值的权重准确性对比分析;Figure 8b is a comparative analysis of the weight accuracy of actual measured values;

图8c是德尔菲法的权重准确性对比分析;Figure 8c is a comparative analysis of the weight accuracy of the Delphi method;

图9是隐私评价效果与时间的关系;Figure 9 is the relationship between privacy evaluation effect and time;

图10是隐私评价效果与属性数量的关系。Figure 10 shows the relationship between the privacy evaluation effect and the number of attributes.

具体实施方式Detailed ways

下面结合附图与实施例对本发明作进一步说明。The present invention will be further described below in conjunction with the accompanying drawings and embodiments.

应该指出,以下详细说明都是例示性的,旨在对本申请提供进一步的说明。除非另有指明,本文使用的所有技术和科学术语具有与本申请所属技术领域的普通技术人员通常理解的相同含义。It should be pointed out that the following detailed description is exemplary and intended to provide further explanation to the present application. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.

需要注意的是,这里所使用的术语仅是为了描述具体实施方式,而非意图限制根据本申请的示例性实施方式。如在这里所使用的,除非上下文另外明确指出,否则单数形式也意图包括复数形式,此外,还应当理解的是,当在本说明书中使用术语“包含”和/或“包括”时,其指明存在特征、步骤、操作、器件、组件和/或它们的组合。It should be noted that the terminology used here is only for describing specific implementations, and is not intended to limit the exemplary implementations according to the present application. As used herein, unless the context clearly dictates otherwise, the singular is intended to include the plural, and it should also be understood that when the terms "comprising" and/or "comprising" are used in this specification, they mean There are features, steps, operations, means, components and/or combinations thereof.

正如背景技术所介绍的,现有技术中存在不能很好地评价分块混淆隐私保护方法的隐私保护效果的不足,为了解决如上的技术问题,本申请提出了一种基于德尔熵值法的大数据平台隐私保护评价方法及装置。As introduced in the background technology, there are deficiencies in the prior art that the privacy protection effect of the block obfuscation privacy protection method cannot be well evaluated. In order to solve the above technical problems, this application proposes a large Data platform privacy protection evaluation method and device.

本申请的一种典型的实施方式中,如图1所示,在大数据应用支撑平台中,基于分块混淆的隐私保护方法将原始数据分成若干个数据块并混淆数据之间的关系,实现了明文状态下的大数据隐私保护,但是这种隐私保护机制,并不能直观的向用户展示数据的隐私保护效果。本发明在基于分块混淆的隐私保护方法的基础上,提供了一种基于德尔熵值法的大数据平台隐私保护评价方法,实现了大数据隐私保护的可视化评价,同时也很好的证明了分块混淆方法对大数据的隐私保护效果,为大数据隐私保护的深入研究提供了理论支撑。In a typical implementation of the present application, as shown in Figure 1, in the big data application support platform, the privacy protection method based on block obfuscation divides the original data into several data blocks and confuses the relationship between the data to realize However, this privacy protection mechanism cannot intuitively show the privacy protection effect of data to users. On the basis of the privacy protection method based on block obfuscation, the present invention provides a privacy protection evaluation method for big data platforms based on the Del entropy method, which realizes the visual evaluation of privacy protection for big data, and at the same time proves that The privacy protection effect of block obfuscation method on big data provides a theoretical support for the in-depth study of big data privacy protection.

1、定义隐私保护评价指标1. Define privacy protection evaluation indicators

(1)子元组(Sub Tuple,ST),一条数据记录DR={DA1,DA2,…,DAn},DA表示数据属性名称,根据隐私约束,DR被分成了多个片段,即DR={(DA1,DA3,DA6),(DA5,DA8,DAn),…,(DA4,DA9,DA11)},每个片段即为一个子元组ST,对于任意的子元组STi和STj,有STi∩STj=φ。(1) Sub Tuple (Sub Tuple, ST), a data record DR={DA 1 , DA 2 ,...,DA n }, DA represents the name of the data attribute, and according to privacy constraints, DR is divided into multiple fragments, namely DR={(DA 1 , DA 3 , DA 6 ), (DA 5 , DA 8 , DA n ), ..., (DA 4 , DA 9 , DA 11 )}, each segment is a sub-tuple ST, For any sub-tuples ST i and ST j , ST i ∩ ST j =φ.

(2)隐私属性泄露概率(Privacy Attribute Leakage Probability,PALP),从经过隐私保护后的数据块集合中,获取到一条包含正确的元组标识和一个隐私属性的概率,即为隐私属性泄露概率。(2) Privacy Attribute Leakage Probability (PALP): From the privacy-protected data block set, the probability of obtaining a piece containing the correct tuple identifier and a privacy attribute is the privacy attribute leakage probability.

隐私属性PAi在数据块i中的隐私属性泄露概率为:The privacy attribute leakage probability of privacy attribute PA i in data block i is:

其中,NN为数据块中主属性的同名数据记录,NV为某一个隐私属性PAi在数据块i中的属性取值数,Ri表示数据块i的总数据条数;RNVj为数据块i中属性PA取值为j的数据条数;Among them, N N is the data record with the same name as the main attribute in the data block, N V is the value value of a certain privacy attribute PA i in the data block i, R i represents the total number of data items in the data block i; R NVj is The number of data pieces whose attribute PA value is j in the data block i;

假定PALP不超过标准值λ,所有隐私属性的泄露概率向量P1=[PALPPA1,PALPPA2,…,PALPPANp],如果PALPAi>λ,表示隐私属性PAi发生了隐私泄露。Assuming that PALP does not exceed the standard value λ, the leakage probability vector P 1 of all privacy attributes = [PALP PA1 , PALP PA2 , .

(3)隐私属性泄露比率(Privacy Attribute Leakage Ratio,PALR),假设数据存储模式DMS1中有P1个隐私属性,其中泄露隐私的属性有P2个,则PALR=P1/P2(3) Privacy Attribute Leakage Ratio (PALR), assuming that there are P 1 privacy attributes in the data storage mode DMS 1 , among which there are P 2 attributes that leak privacy, then PALR=P 1 /P 2 .

(4)隐私属性值熵(Privacy Attribute Value Entropy,PAVE),假设数据块i中的隐私属性PAi(1≤i≤n)有NPAi个属性可取值{V1,V2,…,VNPAi},每种取值的比率为VRj(1≤j≤NPAi),则PAi的隐私属性值熵PAVEi为:(4) Privacy Attribute Value Entropy (PAVE), assuming that the privacy attribute PA i (1≤i≤n) in data block i has N PAi attribute values {V 1 , V 2 ,..., V NPAi }, the ratio of each value is VR j (1≤j≤N PAi ), then the privacy attribute value entropy PAVE i of PA i is:

其中,VRj为数据块i中的隐私属性PAi的取值比率;Among them, VR j is the value ratio of the privacy attribute PA i in the data block i;

PAVE的值越大,隐私属性取值的比率越分散,隐私保护效果越好;PAVE值越小,隐私属性取值比率分布越集中,隐私保护效果越差。The larger the value of PAVE, the more dispersed the ratio of privacy attribute values, and the better the privacy protection effect; the smaller the value of PAVE, the more concentrated the distribution of privacy attribute value ratios, and the worse the privacy protection effect.

(5)块偏离度(Chunk Deviation Degree,CDD),在一个DSM中,首先对样本进行随机的增加、删除和修改操作,并记录下每次操作后样本中每个数据块内变化的元组数目;然后计算数据块彼此之间的差距,计算每个数据块的差距均值;最后计算出所有数据块的差距均值的均值作为DSM的块偏离度。例如,DSM1中有Nc个数据块,每个数据块的变动元组数据为DC1,DC2,…,DCNc,则CDD的计算公式如下所示:(5) Chunk Deviation Degree (CDD), in a DSM, the sample is first randomly added, deleted and modified, and the changed tuples in each data block in the sample are recorded after each operation number; then calculate the gap between the data blocks, and calculate the average value of the gap of each data block; finally calculate the average value of the average value of the gap of all data blocks as the block deviation degree of DSM. For example, there are N c data blocks in DSM 1 , and the change tuple data of each data block is DC 1 , DC 2 , ..., DC Nc , then the calculation formula of CDD is as follows:

其中,Nc为DSM1中数据块个数;DCi为数据块i的变动元组数;DCj为数据块j的变动元组数;Among them, N c is the number of data blocks in DSM 1 ; DC i is the number of change tuples of data block i; DC j is the number of change tuples of data block j;

CDD越小,表示数据块间的数据变动距离越小,数据块操作上的差异性越小,隐私保护保护越差;CDD越大,表示数据块间的数据变动距离越大,数据块操作上的差异性越大,隐私保护保护越好。The smaller the CDD, the smaller the data change distance between data blocks, the smaller the difference in data block operations, and the worse the privacy protection; the larger the CDD, the larger the data change distance between data blocks, and the smaller the data block operation. The greater the difference, the better the privacy protection.

2、基于分块混淆的隐私保护技术2. Privacy protection technology based on block obfuscation

基于分块混淆的隐私保护机制通过构造三方(可信第三方、云服务提供商和租户)可信交互模型,实现在明文状态下保护租户的数据隐私,如图2所示。The privacy protection mechanism based on block obfuscation realizes the protection of data privacy of tenants in plain text by constructing a trusted interaction model of three parties (trusted third parties, cloud service providers and tenants), as shown in Figure 2.

首先根据租户定制的隐私约束,将租户提交的数据分成多个数据块,使得在同一个隐私约束中的组合隐私属性分布到不同的数据块中,即租户身份信息和敏感信息分割开来,同时对单隐私属性进行拆分和转换;其次隐私保护模块与第三方协作,通过乘法同态加密为同一条记录的不同数据块中的数据切片生成一个数据切片标识,从而混淆数据块之间的关系,并把这些关系存储到可信第三方,以便后续的数据重构;再次通过混淆后攻击者很难发现不同数据块中记录的对应关系,但当数据块中的数据分布不均匀时,攻击者仍可能以较大的概率猜中某个个体的隐私信息,因此我们提出了α、β和γ三种均衡化机制,通过添加伪造数据,保证数据隐私泄露的概率不会超过max(1/n,βkk);最后根据各个数据节点的计算能力、存储能力和负载能力,将数据块存储到节点上。First, according to the privacy constraints customized by the tenants, the data submitted by the tenants is divided into multiple data blocks, so that the combined privacy attributes in the same privacy constraints are distributed into different data blocks, that is, the tenant identity information and sensitive information are separated, and at the same time Split and convert single privacy attributes; secondly, the privacy protection module cooperates with third parties to generate a data slice identifier for data slices in different data blocks of the same record through multiplicative homomorphic encryption, thereby confusing the relationship between data blocks , and store these relationships in a trusted third party for subsequent data reconstruction; it is difficult for an attacker to find the corresponding relationship recorded in different data blocks after obfuscation again, but when the data in the data block is unevenly distributed, the attack Therefore, we propose three equalization mechanisms α, β and γ. By adding falsified data, the probability of data privacy leakage will not exceed max(1/ n, β k , γ k ); Finally, according to the computing power, storage capacity and load capacity of each data node, store the data block on the node.

3、基于概率统计的隐私保护层次分析模型3. A Hierarchical Analysis Model for Privacy Protection Based on Probability and Statistics

构建基于概率统计的的隐私保护层次分析模型求解评价指标,如图3所示。该模型最顶层为最终要评价的对象为隐私保护后的隐私保护数据。根据租户需求的差异、SaaS应用需求以及云服务商的利益等因素,数据存储被分为多个数据存储模式DSM,如图3中第二层。每个数据存储模式中,对应的隐私保护分块策略不同,如图3中第三层。每个DSM和数据块中既可以存储多个租户的数据,也可以存储单个租户的数据,如图4第四层。第五层为隐私保护评价指标,分别包括隐私属性泄露概率、隐私属性泄露比率、隐私属性值熵和块偏离度。第六层为回逆,首先根据为隐私保护评价指标的定义和计算方法,从租户角度计算出每个租户在每个DSM下的评价指标值;然后基于所有租户的评价指标,计算每个DSM的评价指标;最后基于所有DSM的评价指标,计算出整个数据存储的评价指标,详细的计算过程包括两大步骤,如下所示:Construct a privacy protection hierarchical analysis model based on probability and statistics to solve the evaluation index, as shown in Figure 3. The top layer of the model is the privacy-preserving data after the final evaluation object. According to the differences in tenant requirements, SaaS application requirements, and the interests of cloud service providers and other factors, data storage is divided into multiple data storage modes DSM, as shown in the second layer in Figure 3. In each data storage mode, the corresponding privacy protection block strategy is different, as shown in the third layer in Figure 3. Each DSM and data block can store data of multiple tenants or a single tenant, as shown in the fourth layer of Figure 4. The fifth layer is the privacy protection evaluation index, including privacy attribute leakage probability, privacy attribute leakage ratio, privacy attribute value entropy and block deviation degree. The sixth layer is inversion. First, according to the definition and calculation method of the privacy protection evaluation index, calculate the evaluation index value of each tenant under each DSM from the perspective of tenants; then calculate the value of each DSM based on the evaluation indexes of all tenants. The evaluation index; finally, based on the evaluation index of all DSMs, the evaluation index of the entire data storage is calculated. The detailed calculation process includes two steps, as follows:

3.1、租户层向DSM层回逆3.1. The tenant layer returns to the DSM layer

假设数据存储模式DSM1中共有N1个租户,为了测量该模式下的隐私保护评价指标,我们从N1个租户中选出X1个组,每组租户数量为N2(N2<N1),每个分组被看成一个样本X,则DSM1可以看成由无数个个体样本组成的总体Y,即研究对象,所以可以通过对多个样本的隐私保护评价指标特征的研究,估算出总体Y的隐私保护评价指标,详细的计算过程如下所示。Assuming that there are N 1 tenants in the data storage mode DSM 1 , in order to measure the privacy protection evaluation index in this mode, we select X 1 groups from the N 1 tenants, and the number of tenants in each group is N 2 (N 2 <N 1 ), each group is regarded as a sample X, then DSM 1 can be regarded as a population Y composed of countless individual samples, that is, the research object, so it can be estimated by studying the characteristics of privacy protection evaluation indicators of multiple samples The privacy protection evaluation index of the overall Y is obtained, and the detailed calculation process is as follows.

(1)隐私属性泄露概率(1) Privacy attribute leakage probability

通过分析和研究样本X的PALP,评估DSM1的PALP,步骤如下:By analyzing and studying the PALP of sample X, evaluate the PALP of DSM 1 , the steps are as follows:

①计算出所有样本X的隐私属性泄露概率向量,即XPPt=[PALPPA1,PALPPA2,…,PALPPANp],0<t≤X1 ① Calculate the privacy attribute disclosure probability vector of all samples X, that is, XPP t = [PALP PA1 , PALP PA2 , ..., PALP PANp ], 0<t≤X 1

②计算XPP的均值② Calculate the mean value of XPP

式中,X1为样本X的个数;NVi为第i个隐私属性PAi在数据块i中的属性取值数;Ri为数据块i的总数据条数;RNVj为数据块NVj的总数据条数;为第NP个隐私属性PAi在数据块i中的属性取值数;RNp为数据块NP的总数据条数。In the formula, X 1 is the number of samples X; N Vi is the number of attribute values of the ith privacy attribute PA i in data block i; R i is the total number of data pieces of data block i; R NVj is the number of data blocks The total number of data pieces of NV j ; is the number of attribute values of the NPth privacy attribute PA i in data block i; R Np is the total number of data pieces of data block NP .

③计算XPP的方差③ Calculate the variance of XPP

(2)隐私属性泄露比率(2) Privacy attribute leakage ratio

隐私属性泄露比率PALR可根据XPt(0<t≤X1)计算得出,详细过程如下:The privacy attribute leakage ratio PALR can be calculated according to XP t (0<t≤X 1 ), and the detailed process is as follows:

①每一个XPt∈XPP,如果XPt>λ,隐私属性泄露比率为PALRt ① For each XP t ∈ XPP, if XP t > λ, the privacy attribute leakage ratio is PALR t

②计算PALR的均值② Calculate the mean value of PALR

③计算PALR的方差③ Calculate the variance of PALR

(3)隐私属性值熵(3) Privacy attribute value entropy

通过样本的PAVE,评估DSM1的PAVE,评估过程如下所示:Through the PAVE of the sample, evaluate the PAVE of DSM 1. The evaluation process is as follows:

①计算出所有X的隐私属性值熵向量XPE,即XPEt=[PAVEPA1,PAVEPAi,…,PAVEPANp],0<t≤X1 ① Calculate the entropy vector XPE of all privacy attribute values of X, that is, XPE t = [PAVE PA1 , PAVE PAi , ..., PAVE PANp ], 0<t≤X 1

②计算XPE的均值② Calculate the mean value of XPE

③计算XPE的方差③ Calculate the variance of XPE

(4)块偏离度(4) Block deviation

①对于X1中的每个分组,进行增加、删除和修改各Nu次;① For each grouping in X 1 , add, delete and modify each N u times;

②计算每个分组内每个数据块的元组变动数目;② Calculate the number of tuple changes of each data block in each group;

③根据公式3计算每个分组的CDD;③ Calculate the CDD of each group according to formula 3;

④计算CDD的均值④ Calculate the mean value of CDD

⑤计算CDD的方差⑤ Calculate the variance of CDD

3.2、DSM层向顶层回逆3.2, DSM layer reverses to the top layer

在租户层向DSM层回逆过程中,通过对租户层的数据进行统计分析,得到了DSM层的评价指标和评价参数值,其中,评价指标包括隐私属性泄露概率、隐私属性泄露比率、隐私属性值熵和块偏离度;评价参数值包括隐私属性泄露概率、隐私属性泄露比率、隐私属性值熵和块偏离度的均值和方差;然后根据所有DSM层的评价指标信息,继续向最高层回逆。During the inversion process from the tenant layer to the DSM layer, the evaluation indicators and evaluation parameter values of the DSM layer are obtained through statistical analysis of the data of the tenant layer. The evaluation indicators include the privacy attribute leakage probability, the privacy attribute leakage ratio, and the Value entropy and block deviation; evaluation parameter values include privacy attribute leakage probability, privacy attribute leakage ratio, privacy attribute value entropy, and the mean and variance of block deviation; then, according to the evaluation index information of all DSM layers, continue to reverse to the highest layer .

(1)隐私属性泄露概率(1) Privacy attribute leakage probability

①计算每个DSM的PALP均值①Calculate the mean value of PALP for each DSM

其中,Npi为每个DSM中的租户个数;Among them, N pi is the number of tenants in each DSM;

②计算DSM_PALP的均值② Calculate the mean value of DSM_PALP

③计算DSM_PALP的方差③ Calculate the variance of DSM_PALP

(2)隐私属性泄露比率(2) Privacy attribute leakage ratio

①计算DSM_PALR的均值① Calculate the mean value of DSM_PALR

②计算DSM_PALR的方差② Calculate the variance of DSM_PALR

(3)隐私属性值熵(3) Privacy attribute value entropy

①计算每一个DSM的PAVE均值①Calculate the mean value of PAVE for each DSM

②计算DSM_PAVE的均值② Calculate the mean value of DSM_PAVE

③计算DSM_PAVE的方差③ Calculate the variance of DSM_PAVE

(4)块偏离度(4) Block deviation

①计算每一个DSM的CDD均值①Calculate the average CDD of each DSM

②计算DSM_CDD的均值② Calculate the mean value of DSM_CDD

③计算DSM_CDD的方差③ Calculate the variance of DSM_CDD

通过租户层向DSM层、DSM层向顶层的两阶段回逆,计算出隐私保护数据总体分布情况的评价指标和评价参数值,为隐私保护综合评价提供支撑。Through the two-stage inversion from the tenant layer to the DSM layer and from the DSM layer to the top layer, the evaluation indicators and evaluation parameter values of the overall distribution of privacy protection data are calculated to provide support for the comprehensive evaluation of privacy protection.

4、基于德尔熵值法的隐私保护模糊综合评价4. Fuzzy comprehensive evaluation of privacy protection based on Del entropy method

根据上述评价指标建立隐私保护模糊综合评价模型,过程如下所示:According to the above evaluation indicators, a privacy protection fuzzy comprehensive evaluation model is established, and the process is as follows:

(1)确定评价对象的评价因素集合(1) Determine the evaluation factor set of the evaluation object

该隐私保护模糊综合评价模型的评价对象是经过分块混淆处理后的隐私保护数据,因此评价因素集合ES={隐私属性泄露概率,隐私属性泄露比率,隐私属性值熵,块偏离度}。The evaluation object of the privacy-preserving fuzzy comprehensive evaluation model is the privacy-preserving data after block obfuscation processing, so the evaluation factor set ES={privacy attribute leakage probability, privacy attribute leakage ratio, privacy attribute value entropy, block deviation degree}.

(2)确定评语集合(2) Determine the set of comments

该隐私保护模糊综合评价模型的评语集合CS={非常好,好,一般,差},评价的标准如表1所示,其中除了表中给出的标准外,还需保证评价指标波动范围的λ4%与评价指标值隶属于同一等级,甚至属于更高等级。λ,λ1,λ2,λ3,λ4为服务商与租户协商签订的标准值,其中λ2与每个隐私属性取值种类数有关。The comment set CS={very good, good, general, bad} of this privacy protection fuzzy comprehensive evaluation model, the evaluation standards are shown in Table 1, in addition to the standards given in the table, it is also necessary to ensure that the fluctuation range of the evaluation index λ 4 % belongs to the same level as the evaluation index value, or even belongs to a higher level. λ, λ 1 , λ 2 , λ 3 , λ 4 are the standard values negotiated and signed between service providers and tenants, where λ 2 is related to the number of types of values for each privacy attribute.

表1评价标准Table 1 Evaluation Criteria

(3)单因素评价(3) Single factor evaluation

本发明采用基于概率统计的隐私保护层次分析模型计算隐私保护模糊综合评价模型所需的评价指标值和评价参数值,基于得到所需的评价指标值和评价参数值,计算评价指标值的波动范围,通过评价指标值和波动范围与表1中评价标准做对比,确定属于哪个等级,得到每个评价指标的的评价向量,将这些评价向量组合在一起,构成评价矩阵EM。The present invention adopts the privacy protection hierarchical analysis model based on probability statistics to calculate the evaluation index value and evaluation parameter value required by the privacy protection fuzzy comprehensive evaluation model, and calculates the fluctuation range of the evaluation index value based on the required evaluation index value and evaluation parameter value , by comparing the evaluation index value and fluctuation range with the evaluation criteria in Table 1, determine which grade it belongs to, and obtain the evaluation vector of each evaluation index, and combine these evaluation vectors to form the evaluation matrix EM.

(4)综合评价(4) Comprehensive evaluation

在模糊综合评价中,有两个重要因素,一个是评价矩阵EM,一个是权重W。通过上述步骤,可以得到评价矩阵EM,接下来介绍如何获得评价指标的权重。确定权重W的方法主要包括主观法和客观法,主观方法比较简单,但人为因素太强;客观方法虽然解决了人为因素的干扰,但又太依赖于样本数据,样本数据的变化对权重的影响很大。因此本发明提出了基于德尔菲法和熵值法的综合权重确定法。In the fuzzy comprehensive evaluation, there are two important factors, one is the evaluation matrix EM, and the other is the weight W. Through the above steps, the evaluation matrix EM can be obtained, and then how to obtain the weight of the evaluation index is introduced. The methods to determine the weight W mainly include subjective method and objective method. The subjective method is relatively simple, but the human factor is too strong; the objective method solves the interference of human factors, but it is too dependent on the sample data, and the influence of sample data changes on the weight very big. Therefore, the present invention proposes a comprehensive weight determination method based on the Delphi method and the entropy method.

为了使评价指标的权重更加准确,改进了德尔菲法。原始的德尔法菲是对一组评审专家进行多轮询问,使意见统一,最终得出每个评价指标的权重。本发明将从n个评审专家中进行m次抽取,组成m个小组,每次抽取的人数一定但不同,这样可以得到m个组评价指标的打分,组成决策矩阵,结果如表2所示。In order to make the weight of the evaluation index more accurate, the Delphi method is improved. The original Delphi is to conduct multiple rounds of inquiries to a group of evaluation experts to unify the opinions, and finally obtain the weight of each evaluation index. In the present invention, m extractions are performed from n evaluation experts to form m groups. The number of each extraction is fixed but different, so that the scores of m evaluation indicators can be obtained to form a decision matrix. The results are shown in Table 2.

表2决策矩阵表Table 2 Decision matrix table

基于决策矩阵,使用熵值法得到每个隐私保护评价指标的权重,过程如下所示:Based on the decision matrix, the entropy method is used to obtain the weight of each privacy protection evaluation index. The process is as follows:

计算第j个隐私保护评价指标下所有分组的总评价得分之和Score,其中,Xij为第j个指标下分组i的评价得分;Calculate the sum Score of the total evaluation scores of all groups under the jth privacy protection evaluation index, Among them, X ij is the evaluation score of group i under the jth index;

计算第j个隐私保护评价指标下第i组评分的贡献值Pij Calculate the contribution value P ij of the i-th group score under the j-th privacy protection evaluation index,

计算所有分组评分对隐私保护评价指标值的熵值Ej在本发明中,常数K设置为K=1/ln(m),这样可以保证0≤Ej≤1,由此可知当某一个评价指标下的所有分组评分完全相等的时候,Ej等于1,此时该属性的权重为0,在决策分析中,可以忽略该属性。Calculate the entropy value E j of all group scores to the privacy protection evaluation index value, In the present invention, the constant K is set as K=1/ln(m), which can ensure that 0≤E j ≤1, so it can be seen that when all group scores under a certain evaluation index are completely equal, E j is equal to 1 , at this time the weight of this attribute is 0, and this attribute can be ignored in decision analysis.

计算第j个隐私保护评价指标的重要程度Dj,Dj=1-Ej;Dj表示第j指标的重要程度,指标值的差异越大,熵值Ej越小,Dj值越大,表示第j项指标对评价影响越大。Calculate the importance D j of the j-th privacy protection evaluation index, D j = 1-E j ; D j represents the importance of the j-th index, the greater the difference in index values, the smaller the entropy value E j , and the higher the value of D j Larger means that the jth index has a greater impact on the evaluation.

计算第j个隐私保护评价指标的权重Wj Calculate the weight W j of the jth privacy protection evaluation index,

各隐私保护评价指标的权重W=(W1,W2,W3,W4)。The weight W of each privacy protection evaluation index = (W 1 , W 2 , W 3 , W 4 ).

得到评价矩阵和评价指标权重,可以计算出评价向量S。因为模糊算子能够明显地体现权数的作用,综合程度强,且能够充分利用评价矩阵EM信息,所以本节使用此算子进行计算,求解评价向量S,如公式25所示。After obtaining the evaluation matrix and evaluation index weights, the evaluation vector S can be calculated. Because of the fuzzy operator It can clearly reflect the role of weights, has a strong degree of comprehensiveness, and can make full use of the information of the evaluation matrix EM, so this section uses this operator to calculate and solve the evaluation vector S, as shown in formula 25.

S=W·EM(25)S=W·EM(25)

此外,本发明中将评价效果分数化处理,设定隐私保护评价级别划分的等级标准,如表3所示。In addition, in the present invention, the evaluation effect is converted into scores, and the classification standard for privacy protection evaluation levels is set, as shown in Table 3.

表3评语与分数的对应关系Table 3 Correspondence between comments and scores

评语comments 非常好very good it is good 一般generally Difference 分数Fraction 100-75100-75 75-5075-50 50-2550-25 25-025-0

最终,隐私保护综合评价方法如下所示:Finally, the comprehensive evaluation method of privacy protection is as follows:

计算评价向量S,S=(S1S2S3S4)Calculate the evaluation vector S, S=(S 1 S 2 S 3 S 4 )

进行归一化处理,S’=(S1/(S1+S2+S3+S4)S2/(S1+S2+S3+S4)S3/(S1+S2+S3+S4)S4/(S1+S2+S3+S4));Perform normalization processing, S'=(S 1 /(S 1 +S 2+ S 3 +S 4 )S 2 /(S 1 +S 2+ S 3 +S 4 )S 3 /(S 1 +S 2+ S 3 +S 4 )S 4 /(S 1 +S 2+ S 3 +S 4 ));

采用加权平均获得模糊综合评价得分,其中,模糊综合评价得分ER=(100*S1’+75*S2’+50*S3’+25*S4’)/(S1’+S2+S3’+S4’),根据ER的值和等级标准,获取最终的评价结果。The weighted average is used to obtain the fuzzy comprehensive evaluation score, among which, the fuzzy comprehensive evaluation score ER=(100*S 1 '+75*S 2 '+50*S 3 '+25*S 4 ')/(S 1 '+S 2 ' + S 3 '+S 4 '), according to the ER value and grade standard, the final evaluation result is obtained.

5、实验评估5. Experimental evaluation

5.1实验环境5.1 Experimental environment

为了验证本发明提出的基于德尔熵值法的隐私保护评价方法的评价效果,我们通过模拟实验对原始的分块混淆算法PPCC、隐私保护调整算法SAG和数据关系隐藏算法HMDR进行了评价和验证。In order to verify the evaluation effect of the privacy protection evaluation method based on the delta entropy method proposed by the present invention, we have evaluated and verified the original block confusion algorithm PPCC, privacy protection adjustment algorithm SAG and data relationship hiding algorithm HMDR through simulation experiments.

实验环境:台服务器作为数据节点,系统采用Red Hat Enterprise Linux6.2版本;Apache Tomcat作为应用服务器,配置为4核CPU Inter(R)Xeon(R)2.40GHz,10GB内存,500G硬盘;测试数据库采用5.5.25MySQL Community Server(GPL);编程环境选用Eclipse-SDK-4.3-win64,编程语言为Java 1.7;实验中使用的数据集为UCI公布的1990年的美国人口普查数据集,原始数据集中包含68属性,250万条数据。Experimental environment: a server is used as a data node, and the system uses Red Hat Enterprise Linux6.2 version; Apache Tomcat is used as an application server, configured as a 4-core CPU Inter(R)Xeon(R) 2.40GHz, 10GB memory, 500G hard disk; the test database uses 5.5.25MySQL Community Server (GPL); Eclipse-SDK-4.3-win64 is selected as the programming environment, and Java 1.7 is the programming language; the data set used in the experiment is the 1990 U.S. census data set released by UCI, and the original data set contains 68 Attributes, 2.5 million pieces of data.

5.2结果分析5.2 Result analysis

5.2.1评价指标5.2.1 Evaluation indicators

为了验证本发明提出的评价指标是否能够正确地评价分块混淆方法的隐私保护效果,设计了模拟实验一验证三种算法下四种评价指标的差异性。实验一从数据集中选取了150万条数据,60个属性,其中隐私属性为20个,设定存储模式15个,每个存储模式分配10万条,假设每个模式中有20个租户,每次抽取其中15,共抽取30次。首先使用PPCC算法对所有存储模式中的数据进行处理,并进行任意地增加、删除和修改操作各50次,调整隐私需求15次,根据公式1到25计算所需的所有评价指标;然后将数据恢复原貌,再使用PPCC和HMDR算法先后对数据进行处理,重复上述数据操作和计算评价指标步骤;最后再将数据恢复原貌,使用三种算法先后处理数据,并重复上述数据操作和计算评价指标步骤,实验结果如图4、5、6和7所示。In order to verify whether the evaluation index proposed by the present invention can correctly evaluate the privacy protection effect of the block obfuscation method, a simulation experiment was designed to verify the difference of the four evaluation indexes under the three algorithms. Experiment 1 selected 1.5 million pieces of data from the data set, 60 attributes, including 20 privacy attributes, set 15 storage modes, and allocated 100,000 pieces to each storage mode. Assuming that each mode has 20 tenants, each 15 of them will be drawn once, for a total of 30 draws. First, use the PPCC algorithm to process the data in all storage modes, and perform arbitrary addition, deletion, and modification operations 50 times each, adjust privacy requirements 15 times, and calculate all required evaluation indicators according to formulas 1 to 25; then the data Restore the original appearance, then use the PPCC and HMDR algorithms to process the data successively, repeat the above steps of data operation and calculation of evaluation indicators; finally restore the data to its original appearance, use the three algorithms to process the data successively, and repeat the above steps of data operation and calculation of evaluation indicators , the experimental results are shown in Figures 4, 5, 6 and 7.

图4展现了三种算法下15种模式的隐私属性泄露概率的分布情况,规定PALP不超过0.2为安全范围。由图4可知,在PPCC下,PALP主要分布在0.05-0.3之间,其中有66.7%的概率分布在0.05-0.2之间;由于数据操作和隐私需求的变化,导致有33.3%的概率发生隐私属性泄露。在HMDR下,PALP主要分布在0.00-0.25之间,其中80%的概率分布在0.05-0.2之间,隐私保护效果比PPCC略有提升,但仍有20%的概率发生隐私属性泄露。在SAG算法下,PALP主要分布0.00-0.2之间,符合隐私保护要求,但也存在6.7%的概率发生隐私属性泄露。因此从PALP层面,隐私保护效果还有提升的空间。Figure 4 shows the distribution of the privacy attribute leakage probability of 15 modes under the three algorithms, and it is stipulated that PALP does not exceed 0.2 as the safe range. It can be seen from Figure 4 that under PPCC, PALP is mainly distributed between 0.05-0.3, of which 66.7% probability distribution is between 0.05-0.2; due to changes in data operations and privacy requirements, there is a 33.3% probability of privacy Property leaks. Under HMDR, PALP is mainly distributed between 0.00-0.25, and 80% of the probability distribution is between 0.05-0.2. The privacy protection effect is slightly improved compared with PPCC, but there is still a 20% probability of privacy attribute leakage. Under the SAG algorithm, PALP is mainly distributed between 0.00-0.2, which meets the privacy protection requirements, but there is also a 6.7% probability of privacy attribute leakage. Therefore, from the perspective of PALP, there is still room for improvement in the effect of privacy protection.

图5展现了三种算法下15种模式的隐私属性泄露比率的分布情况,规定PALR不超过0.3为安全范围。从图5中可知PPCC算法存在26.6%的概率发生PALR,但仍有66.6%的概率有比较好的隐私保护效果;与PPCC相比,HMDR只有6.7%的概率发生PALR,而且有73.3%的概率具有较好的隐私保护效果,显著提升了保护效果;与前两个算法相比,SAG虽然也只有13.3%的概率发生PALR,但有80%的概率具有非常好的隐私保护效果。Figure 5 shows the distribution of the privacy attribute leakage ratios of 15 modes under the three algorithms, and it is stipulated that the PALR does not exceed 0.3 as the safe range. It can be seen from Figure 5 that there is a 26.6% probability of PALR in the PPCC algorithm, but there is still a 66.6% probability of better privacy protection effect; compared with PPCC, HMDR has only a 6.7% probability of PALR, and a 73.3% probability It has a good privacy protection effect and significantly improves the protection effect; compared with the previous two algorithms, although SAG has only a 13.3% probability of PALR, it has a very good privacy protection effect with a probability of 80%.

图6中展现了三种算法下隐私属性值熵PAVE的变化情况。由图6可知,三种算法下,隐私属性值的数量越多,熵就越大,相应的隐私保护效果越好,例如当隐私属性值的数量为32时,三种算法的熵值都与5靠近,5是理想状态下的熵值;而当隐私属性值的数量为4时,三种算法的熵值与理想值2相对远一些,这是因为隐私属性值个数越多,对应的PAVE越大,隐私属性值概率的分布越广,隐私保护效果就越好。SAG熵值越高于HMDR和PPCC,HMDR的熵值略高于PPCC,这种趋势符合我们的预期结果。Figure 6 shows the changes in the privacy attribute value entropy PAVE under the three algorithms. It can be seen from Figure 6 that under the three algorithms, the greater the number of privacy attribute values, the greater the entropy, and the better the corresponding privacy protection effect. For example, when the number of privacy attribute values is 32, the entropy values of the three algorithms are all the same as 5 is close, and 5 is the entropy value under the ideal state; when the number of privacy attribute values is 4, the entropy values of the three algorithms are relatively far from the ideal value 2, because the more the number of privacy attribute values, the corresponding The larger the PAVE, the wider the distribution of the probability of the privacy attribute value, and the better the privacy protection effect. The entropy value of SAG is higher than that of HMDR and PPCC, and the entropy value of HMDR is slightly higher than that of PPCC. This trend is in line with our expected results.

图7展现了三种算法下15种模式的数据块间的偏离程度情况,规定块偏离度大于2属于安全范围。如图7所示,PPCC算法下,块偏离度有40%的概率落在0-2之间,这说明数据块间的关联性很大,而且数据操作等对每个数据块的影响差异不大,显然隐私保护效果并不好;其他两种算法下,都有50%以上的概率落在2-5之间,具有较好的效果,此外都有13%左右得概率落在了8-10之间,SAG只有6.7%的概率发生隐私泄露,HNDR有13.3%的概率发生隐私泄露。综上可知这两种算法与PPCC相比,都增强了隐私保护效果。Figure 7 shows the degree of deviation between data blocks in 15 modes under the three algorithms, and it is stipulated that the degree of deviation of the block is greater than 2, which belongs to the safe range. As shown in Figure 7, under the PPCC algorithm, there is a 40% probability that the block deviation degree falls between 0 and 2, which shows that the correlation between data blocks is very large, and the influence of data operations on each data block is not different. Obviously, the effect of privacy protection is not good; under the other two algorithms, more than 50% of the probability falls between 2-5, which has a good effect, and about 13% of the probability falls between 8-5. Between 10, SAG has only 6.7% probability of privacy leakage, and HNDR has 13.3% probability of privacy leakage. In summary, compared with PPCC, these two algorithms have enhanced the effect of privacy protection.

5.2.2综合对比评价5.2.2 Comprehensive comparative evaluation

通过上述实验可知,三种算法下四种评价指标都有明显的变化,而且变化趋势与真实情况符合,又设计实验二,对三种算法进行综合评价和等级划分。实验二中选择数据200万条,分别存储在10个模式下,每个模式分配20万条,设定每个模式中存有30个租户,每次抽取其中20个租户的数据,分100组抽取,每组分别抽取50次。假设λ=0.2,λ1=0.3,λ2=|log2(1/NPAi)|,λ3=2,λ4=80,i=1,2,3,…,Np。对10种模式下的数据依次用三种算法进行处理,并且每次处理完后,进行增加、删除、修改操作和调整隐私需求各50次,并记录每次处理前和处理后的数据,计算出评价指标值和评价参数值。经过对通过实验获取的数据处理后得到如表4所示的数据。Through the above experiments, it can be seen that the four evaluation indicators under the three algorithms have obvious changes, and the change trend is consistent with the real situation. Experiment 2 is designed to comprehensively evaluate and classify the three algorithms. In Experiment 2, 2 million pieces of data were selected and stored in 10 modes, each mode allocated 200,000 pieces, and each mode was set to store 30 tenants, and the data of 20 tenants was extracted each time, divided into 100 groups Draw, draw 50 times for each group respectively. Suppose λ=0.2, λ 1 =0.3, λ 2 =|log 2 (1/N PAi )|, λ 3 =2, λ 4 =80, i=1, 2, 3, . . . , N p . The data in the 10 modes are sequentially processed by three algorithms, and after each processing, add, delete, modify, and adjust the privacy requirements 50 times each, and record the data before and after each processing, and calculate Output the evaluation index value and evaluation parameter value. After processing the data obtained through the experiment, the data shown in Table 4 are obtained.

表4隐私保护等级的频数分配表Table 4 Frequency Allocation Table of Privacy Protection Levels

由表4可以计算出三种算法的评价矩阵,如下所示:The evaluation matrix of the three algorithms can be calculated from Table 4, as follows:

实验二中,选择同组的研究人员15名,先后分成8组,每组抽取10名人员,进行多轮询问对评价指标进行打分,结果如表5所示。In Experiment 2, 15 researchers in the same group were selected and divided into 8 groups successively. 10 people were selected from each group, and multiple rounds of inquiries were conducted to score the evaluation indicators. The results are shown in Table 5.

表5基于10位专家打分的决策矩阵表Table 5 Decision matrix table based on 10 experts scoring

根据决策矩阵表5和算法2计算出每个分组的贡献度,如表6所示。According to the decision matrix Table 5 and Algorithm 2, the contribution of each group is calculated, as shown in Table 6.

表6贡献度表Table 6 Contribution table

0.1111110.111111 0.0930230.093023 0.1111110.111111 0.1276600.127660 0.1269840.126984 0.1162790.116279 0.1296300.129630 0.1276600.127660 0.1428570.142857 0.1860470.186047 0.0740740.074074 0.1063830.106383 0.0793650.079365 0.1395350.139535 0.1851850.185185 0.1702130.170213 0.1428570.142857 0.1860470.186047 0.0925930.092593 0.0851060.085106 0.1428570.142857 0.0465120.046512 0.1481480.148148 0.1276600.127660 0.0952380.095238 0.1162790.116279 0.1851850.185185 0.0638300.063830 0.1587300.158730 0.1162790.116279 0.0740740.074074 0.1914890.191489

实验中共有8个分组,所以算法2中的常数K=1/(ln(8)=0.481,由此计算出标价指标的熵值E=(-0.96343,-0.98427,-0.97425,-0.96949),评价指标的重要程度D=(0.036574,0.015729,0.025753,0.030505),因此最终求得评价指标的权重W=(0.34,0.14,0.24,0.28)。根据公式26求出归一化之后的三种算法的评价向量SPPCC=W·EMPPCC=(0.0976,0.2666,0.4108,0.2550),同理SHMDR=(0.2260,0.3896,0.3004,0.084),SSAG=(0.3288,0.3896,0.2212,0.0604)。最后通过加权平均法计算三种算法的得分分别是56.67、68.94和74.67,由表4可知,三种算法的隐私保护效果属于好的级别,但从分数上看,SAG的效果优于HMDR效果,HMDR效果好于PPCC的效果,符合实际情况。由此可知,从隐私保护等级上看,三种算法具有较好的隐私保护效果,但是从分数上看,隐私保护效果还有待进一步的增强和优化。There are 8 groups in the experiment, so the constant K in Algorithm 2=1/(ln(8)=0.481, from which the entropy value E=(-0.96343, -0.98427, -0.97425, -0.96949) of the price index is calculated, The importance of the evaluation index D=(0.036574, 0.015729, 0.025753, 0.030505), so the weight W=(0.34, 0.14, 0.24, 0.28) of the evaluation index is finally obtained. According to the formula 26, the three algorithms after normalization are obtained Evaluation vector S PPCC =W·EM PPCC =(0.0976, 0.2666, 0.4108, 0.2550), similarly SHMDR =(0.2260, 0.3896, 0.3004, 0.084), S SAG =(0.3288, 0.3896, 0.2212, 0.0604). Finally The scores of the three algorithms calculated by the weighted average method are 56.67, 68.94 and 74.67 respectively. It can be seen from Table 4 that the privacy protection effects of the three algorithms are at a good level, but from the point of view of the scores, the effect of SAG is better than that of HMDR. The effect is better than that of PPCC, which is in line with the actual situation. It can be seen that from the perspective of privacy protection level, the three algorithms have better privacy protection effect, but from the point of view of score, the privacy protection effect needs to be further enhanced and optimized.

5.2.3指标权重分析5.2.3 Index weight analysis

为了验证本发明提出的改进后的综合权重计算方法的效果,又设计了实验三:分别使用德尔熵值法和德尔菲法计算评价指标权重,然后再根据实际测量值与二者进行对比,实验条件和环境与上述实验条件和环境的相同,结果如图8a、8b、8c所示。In order to verify the effect of the improved comprehensive weight calculation method proposed by the present invention, experiment three was designed: respectively use the Delphi entropy method and the Delphi method to calculate the evaluation index weight, and then compare the two according to the actual measured value, the experiment The conditions and environment are the same as those of the above-mentioned experimental conditions and environment, and the results are shown in Figures 8a, 8b, and 8c.

由图8a、8b、8c可知,图8a中的CDD的值为28%,图8c中的值为12%,而实际的测量值为23%,与图8a更为接近;同理,其他指标在德尔熵值法下的权重值与德尔菲法下的值相比,与实际测量值的测距更小,效果更好。德尔菲法主要依赖于人的感官判断,主观性太强,由图c可知,有些评价指标权重分配过多,有些权重分配又过少,而德尔熵值法虽然也部分依赖主观判断,但通过熵值法一定程度上降低了人为因素的干扰,使得权重的分配更为合理。It can be seen from Figures 8a, 8b, and 8c that the value of CDD in Figure 8a is 28%, the value in Figure 8c is 12%, and the actual measured value is 23%, which is closer to Figure 8a; similarly, other indicators Compared with the value under the Delphi method, the weight value under the Delphi entropy method has a smaller distance from the actual measured value and a better effect. The Delphi method mainly relies on human sensory judgment, which is too subjective. As can be seen from Figure c, some evaluation indicators have too much weight distribution, and some weight distribution is too little. Although the Delphi entropy method also partially relies on subjective judgment, it passes The entropy method reduces the interference of human factors to a certain extent, making the weight distribution more reasonable.

5.2.4评价效果分析5.2.4 Evaluation effect analysis

为了验证本发明提出的基于德尔熵值法的隐私保护评价方法的有效性和持续性,又设计了实验四:在优化后的分块混淆方法保护下,对数据进行监控和评价,评价指标和权重不变;实验中,每隔一周对数据进行一次评价,共进行四次评价,评价过程与综合对比评价过程类似;在非评价期间,模拟日常的数据操作和业务对数据进行处理;实验结果如图9,10所示。In order to verify the effectiveness and continuity of the privacy protection evaluation method based on the Del entropy method proposed by the present invention, experiment 4 was designed: under the protection of the optimized block obfuscation method, the data was monitored and evaluated, and the evaluation indicators and The weight remains unchanged; in the experiment, the data is evaluated once every other week, and a total of four evaluations are carried out. The evaluation process is similar to the comprehensive comparison evaluation process; during the non-evaluation period, the daily data operation and business processing are simulated; the experimental results As shown in Figures 9 and 10.

由图9可知,随着时间的变化,评价的分数总体有上升的趋势,但波动幅度不太,其中第三周的时候,还略微下降;从隐私等级上来看,第一周、第二周和第四周属于非常好的状态,而第三周属于好的状态。总体而言,评价效果比较稳定。由图10可知,随着数据属性数量的增加,评价分数呈现前期比较稳定,波动起伏不大,有时候高一点,有时候低一点,但当属性数量达到100个的时候,评价分数出现比较大的下降,这是因为随着数据属性数量的增多,数据分块数也越来越多,但数据节点的数量是有限的,因此存在部分不能放在一起的数据块放在了同一个节点,从而增加了隐私泄露的概率,使得评价分数有所下降。由实验可知,分块混淆隐私保护方法虽然具有较好的隐私保护效果,但还需要进一步完善。It can be seen from Figure 9 that, with the change of time, the evaluation scores generally have an upward trend, but the fluctuation range is not too large. In the third week, it dropped slightly; from the perspective of privacy level, the first week and the second week And the fourth week belongs to a very good state, while the third week belongs to a good state. Overall, the evaluation effect is relatively stable. It can be seen from Figure 10 that with the increase in the number of data attributes, the evaluation score is relatively stable in the early stage, with little fluctuation, sometimes higher and sometimes lower, but when the number of attributes reaches 100, the evaluation score appears relatively large This is because as the number of data attributes increases, the number of data blocks increases, but the number of data nodes is limited, so some data blocks that cannot be put together are placed on the same node. As a result, the probability of privacy leakage is increased, and the evaluation score is reduced. It can be seen from the experiment that although the block obfuscation privacy protection method has a good privacy protection effect, it needs to be further improved.

在大数据平台中,基于分块混淆的隐私保护方法虽然可以实现数据的隐私保护,然而对于租户而言,并不能直观的看到该方法的隐私保护效果,因此本发明针对上述问题,提出一种基于德尔熵值法的隐私保护评价方法,实现对分块混淆技术的可视化隐私评价,建立一种基于德尔熵值法的隐私保护模糊综合评价模型以及基于概率统计的隐私保护层次分析模型,通过两阶段的回溯层次分析得到隐私保护后的数据分布总体的评价指标值和评价参数值,根据评价指标和评价参数值计算出单因素评价向量,降低人为因素的干扰,同时综合并改进了基于主观判断的德尔菲法和客观判断的熵值法,提高了评价指标权重分配的准确性和公平性,实现了对分块混淆隐私保护方法的可视化评价,为分块混淆隐私保护方法的进一步优化提供了理论基础和方向;最后本发明通过实验验证了评价指标和评价方法的有效性。In the big data platform, although the privacy protection method based on block obfuscation can realize the privacy protection of data, for tenants, the privacy protection effect of this method cannot be seen intuitively. Therefore, the present invention proposes a method to solve the above problems A privacy protection evaluation method based on the Del entropy method, which realizes the visual privacy evaluation of block obfuscation technology, establishes a privacy protection fuzzy comprehensive evaluation model based on the Del entropy method and a privacy protection hierarchical analysis model based on probability statistics, through The two-stage backtracking hierarchical analysis obtains the overall evaluation index value and evaluation parameter value of the data distribution after privacy protection, and calculates the single-factor evaluation vector according to the evaluation index and evaluation parameter value to reduce the interference of human factors, and at the same time synthesize and improve the subjective The Delphi method of judgment and the entropy method of objective judgment have improved the accuracy and fairness of the weight distribution of evaluation indicators, realized the visual evaluation of the block obfuscation privacy protection method, and provided a basis for further optimization of the block obfuscation privacy protection method. The theoretical basis and direction are clarified; finally, the present invention verifies the effectiveness of the evaluation index and evaluation method through experiments.

本申请的另一种典型实施方式,提供了一种计算机装置,用于隐私保护数据分析,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现以下步骤,包括:Another typical embodiment of the present application provides a computer device for privacy protection data analysis, including a memory, a processor, and a computer program stored in the memory and operable on the processor, the processor executes The following steps are implemented during the procedure, including:

构建基于概率统计的的隐私保护层次分析模型,求解隐私保护数据的隐私保护评价指标;Construct a privacy protection hierarchical analysis model based on probability and statistics, and solve the privacy protection evaluation index of privacy protection data;

根据上述得到的隐私保护评价指标,建立基于德尔熵值法的隐私保护模糊综合评价模型;According to the privacy protection evaluation index obtained above, a fuzzy comprehensive evaluation model of privacy protection based on Del entropy method is established;

获取隐私保护模糊综合评价模型的综合评价得分,并根据得分和等级标准获取最终的评价结果。Obtain the comprehensive evaluation score of the privacy protection fuzzy comprehensive evaluation model, and obtain the final evaluation result according to the score and grade standard.

本申请的另一种典型实施方式,提供了一种计算机可读存储介质,其上存储有用于隐私保护数据分析的计算机程序,其特征是,该程序被处理器执行时实现以下步骤:Another typical implementation of the present application provides a computer-readable storage medium on which is stored a computer program for privacy-protected data analysis, which is characterized in that, when the program is executed by a processor, the following steps are implemented:

构建基于概率统计的的隐私保护层次分析模型,求解隐私保护数据的隐私保护评价指标;Construct a privacy protection hierarchical analysis model based on probability and statistics, and solve the privacy protection evaluation index of privacy protection data;

根据上述得到的隐私保护评价指标,建立基于德尔熵值法的隐私保护模糊综合评价模型;According to the privacy protection evaluation index obtained above, a fuzzy comprehensive evaluation model of privacy protection based on Del entropy method is established;

获取隐私保护模糊综合评价模型的综合评价得分,并根据得分和等级标准获取最终的评价结果。Obtain the comprehensive evaluation score of the privacy protection fuzzy comprehensive evaluation model, and obtain the final evaluation result according to the score and grade standard.

上述虽然结合附图对本发明的具体实施方式进行了描述,但并非对本发明保护范围的限制,所属领域技术人员应该明白,在本发明的技术方案的基础上,本领域技术人员不需要付出创造性劳动即可做出的各种修改或变形仍在本发明的保护范围以内。Although the specific implementation of the present invention has been described above in conjunction with the accompanying drawings, it does not limit the protection scope of the present invention. Those skilled in the art should understand that on the basis of the technical solution of the present invention, those skilled in the art do not need to pay creative work Various modifications or variations that can be made are still within the protection scope of the present invention.

Claims (10)

1. a kind of big data platform secret protection evaluation method based on Dare Information Entropy, characterized in that this method includes following Step:
(1) the secret protection Analytic Hierarchy Process Model based on probability statistics is built, the secret protection for solving secret protection data is commented Valence index;
(2) according to the secret protection evaluation index obtained in step 1, the secret protection fuzzy synthesis based on Dare Information Entropy is established Evaluation model;
(3) the overall merit score of secret protection model of fuzzy synthetic evaluation is obtained, and is obtained most according to score and classification standard Whole evaluation result.
2. the big data platform secret protection evaluation method according to claim 1 based on Dare Information Entropy, characterized in that The secret protection Analytic Hierarchy Process Model based on probability statistics, including:
(1-1) input layer, the user data after secret protection to be evaluated;
(1-2) data model storage layer is input with the user data after secret protection to be evaluated, according to the difference of user demand Different, SaaS application demands and cloud service provider advantage factors, difference is stored in by the user data after secret protection to be evaluated Data model storage DSM in;
User data in each data model storage is divided into different data blocks by (1-3) deblocking layer;
(1-4) client layer is stored in each data model storage DSM and data user data in the block;
(1-5) secret protection evaluation index layer defines secret protection evaluation index and its calculation formula;
(1-6) returns inverse layer, is returned to data model storage layer, data model storage layer to the two benches of input layer by client layer It is inverse, calculate the total evaluation index of secret protection data and evaluation parameter value.
3. the big data platform secret protection evaluation method according to claim 2 based on Dare Information Entropy, characterized in that It is described inverse to data model storage layer, data model storage layer to the two benches of input layer time by tenant's layer, calculate privacy The evaluation index and evaluation parameter value for protecting data total, including:
According to the definition of secret protection evaluation index and its calculation formula, calculates privacy of each user at each DSM and protect Evaluation index value is protected, the secret protection evaluation index value of all users is then based on, the secret protection evaluation for calculating each DSM refers to Mark and evaluation parameter value, are finally based on the evaluation index and evaluation parameter value of all DSM, calculate total hidden of secret protection data Private assessment indicators of protection and evaluation parameter value.
4. the big data platform secret protection evaluation method according to claim 2 based on Dare Information Entropy, characterized in that The secret protection evaluation index includes that private attribute leakage probability, private attribute leakage ratio, private attribute value entropy and block are inclined From degree;Evaluation parameter value includes private attribute leakage probability, private attribute leakage ratio, private attribute value entropy and block irrelevance Mean value and variance.
5. the big data platform secret protection evaluation method according to claim 1 based on Dare Information Entropy, characterized in that The secret protection model of fuzzy synthetic evaluation of the foundation based on Dare Information Entropy, including:
(2-1) determines that the factor of evaluation collection and Comment gathers of secret protection evaluation object, setting secret protection assessment grade divide Classification standard;
(2-2) carries out single factor evaluation to each evaluation index, the evaluation vector of each evaluation index is obtained, according to each evaluation index Evaluation vector build evaluations matrix;
(2-3) determines method using the comprehensive weight based on Delphi method and Information Entropy, obtains each secret protection evaluation criterion weight;
(2-4) uses fuzzy matrix compound operation, obtains fuzzy overall evaluation score and evaluation result.
6. the big data platform secret protection evaluation method according to claim 5 based on Dare Information Entropy, characterized in that The factor of evaluation collection of the secret protection evaluation object includes private attribute leakage probability, private attribute leakage ratio, privacy category Property value entropy and block irrelevance;The Comment gathers include very good, good, general and poor.
7. the big data platform secret protection evaluation method according to claim 5 based on Dare Information Entropy, characterized in that The comprehensive weight using based on Delphi method and Information Entropy determines method, obtains each evaluation criterion weight, including:
M extraction is carried out from several evaluation experts, forms m group, the number extracted every time is certain but different, obtains m The marking of a group of evaluation index;
Using the marking of m group evaluation index, decision matrix is built;
Based on decision matrix, the sum of the overall merit score of all groupings under each secret protection evaluation index is calculated;
The overall merit score by every group of scoring and all groupings under each secret protection evaluation index is quotient respectively, obtains each privacy and protects Protect the contribution margin of every group of scoring under evaluation index;
Based on the contribution margin of every group of scoring under each secret protection evaluation index, all groupings under each secret protection evaluation index are calculated The entropy to score to index value;
The significance level for calculating each secret protection evaluation index, by the significance level of each secret protection evaluation index and always important Degree is compared, and the weight of each secret protection evaluation index is obtained.
8. the big data platform secret protection evaluation method according to claim 5 based on Dare Information Entropy, characterized in that It is described to use fuzzy matrix compound operation, fuzzy overall evaluation score and evaluation result are obtained, including:
The weight of the evaluations matrix of each evaluation index and each evaluation index is subjected to synthesis operation, obtains evaluation vector;
Evaluation vector is normalized, using weighted mean operation, obtains fuzzy overall evaluation score,
Final evaluation result is obtained according to fuzzy overall evaluation score and classification standard.
9. a kind of computer installation, it to be used for secret protection data analysis and evaluation, including memory, processor and is stored in storage On device and the computer program that can run on a processor, feature are being, the processor realized when executing described program with Lower step, including:
The secret protection Analytic Hierarchy Process Model based on probability statistics is built, the secret protection evaluation for solving secret protection data refers to Mark;
According to secret protection evaluation index obtained above, the secret protection fuzzy overall evaluation mould based on Dare Information Entropy is established Type;
The overall merit score of secret protection model of fuzzy synthetic evaluation is obtained, and is obtained finally according to score and classification standard Evaluation result.
10. a kind of computer readable storage medium is stored thereon with the computer journey for secret protection data analysis and evaluation Sequence, characterized in that the program realizes following steps when being executed by processor:
The secret protection Analytic Hierarchy Process Model based on probability statistics is built, the secret protection evaluation for solving secret protection data refers to Mark;
According to secret protection evaluation index obtained above, the secret protection fuzzy overall evaluation mould based on Dare Information Entropy is established Type;
The overall merit score of secret protection model of fuzzy synthetic evaluation is obtained, and is obtained finally according to score and classification standard Evaluation result.
CN201810171758.XA 2018-03-01 2018-03-01 Big data platform secret protection evaluation method and device based on Dare Information Entropy Pending CN108416227A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810171758.XA CN108416227A (en) 2018-03-01 2018-03-01 Big data platform secret protection evaluation method and device based on Dare Information Entropy

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810171758.XA CN108416227A (en) 2018-03-01 2018-03-01 Big data platform secret protection evaluation method and device based on Dare Information Entropy

Publications (1)

Publication Number Publication Date
CN108416227A true CN108416227A (en) 2018-08-17

Family

ID=63129812

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810171758.XA Pending CN108416227A (en) 2018-03-01 2018-03-01 Big data platform secret protection evaluation method and device based on Dare Information Entropy

Country Status (1)

Country Link
CN (1) CN108416227A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109583224A (en) * 2018-10-16 2019-04-05 阿里巴巴集团控股有限公司 A kind of privacy of user data processing method, device, equipment and system
CN109684865A (en) * 2018-11-16 2019-04-26 中国科学院信息工程研究所 A kind of personalization method for secret protection and device
CN110096868A (en) * 2019-04-28 2019-08-06 深圳前海微众银行股份有限公司 Auditing method, device, equipment and the computer readable storage medium of operation code
CN112561305A (en) * 2020-12-10 2021-03-26 上海对外经贸大学 Enterprise data privacy protection evaluation method based on hierarchical model
CN113077155A (en) * 2021-04-07 2021-07-06 国家电网有限公司 Big data situation perception-based power production technical improvement project evaluation model
CN113641915A (en) * 2021-08-27 2021-11-12 北京字跳网络技术有限公司 Recommended methods, apparatus, devices, storage media and program products for objects
CN114117512A (en) * 2020-12-30 2022-03-01 神州融安数字科技(北京)有限公司 Multi-party calculation model measurement method, device, equipment and storage medium
CN118627125A (en) * 2024-08-12 2024-09-10 山东赛宝信息技术咨询有限公司 Database information leakage detection system based on fuzzy entropy algorithm
CN119089508A (en) * 2024-11-08 2024-12-06 四川旅投数字信息产业发展有限责任公司 A shared management method for privacy data

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8332959B2 (en) * 2006-07-03 2012-12-11 International Business Machines Corporation System and method for privacy protection using identifiability risk assessment
CN107392048A (en) * 2017-07-26 2017-11-24 安徽大学 Differential privacy protection method in data visualization and evaluation index thereof

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8332959B2 (en) * 2006-07-03 2012-12-11 International Business Machines Corporation System and method for privacy protection using identifiability risk assessment
CN107392048A (en) * 2017-07-26 2017-11-24 安徽大学 Differential privacy protection method in data visualization and evaluation index thereof

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
陈玉: "基于分块混淆的SaaS隐私保护优化机制研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109583224A (en) * 2018-10-16 2019-04-05 阿里巴巴集团控股有限公司 A kind of privacy of user data processing method, device, equipment and system
CN109583224B (en) * 2018-10-16 2023-03-31 蚂蚁金服(杭州)网络技术有限公司 User privacy data processing method, device, equipment and system
CN109684865A (en) * 2018-11-16 2019-04-26 中国科学院信息工程研究所 A kind of personalization method for secret protection and device
CN110096868A (en) * 2019-04-28 2019-08-06 深圳前海微众银行股份有限公司 Auditing method, device, equipment and the computer readable storage medium of operation code
CN112561305A (en) * 2020-12-10 2021-03-26 上海对外经贸大学 Enterprise data privacy protection evaluation method based on hierarchical model
CN114117512A (en) * 2020-12-30 2022-03-01 神州融安数字科技(北京)有限公司 Multi-party calculation model measurement method, device, equipment and storage medium
CN113077155A (en) * 2021-04-07 2021-07-06 国家电网有限公司 Big data situation perception-based power production technical improvement project evaluation model
CN113077155B (en) * 2021-04-07 2024-05-07 国家电网有限公司 Evaluation model of power production technical transformation projects based on big data situation awareness
CN113641915A (en) * 2021-08-27 2021-11-12 北京字跳网络技术有限公司 Recommended methods, apparatus, devices, storage media and program products for objects
CN113641915B (en) * 2021-08-27 2024-04-16 北京字跳网络技术有限公司 Object recommendation method, device, equipment, storage medium and program product
CN118627125A (en) * 2024-08-12 2024-09-10 山东赛宝信息技术咨询有限公司 Database information leakage detection system based on fuzzy entropy algorithm
CN119089508A (en) * 2024-11-08 2024-12-06 四川旅投数字信息产业发展有限责任公司 A shared management method for privacy data
CN119089508B (en) * 2024-11-08 2025-01-28 四川旅投数字信息产业发展有限责任公司 A method for sharing and managing private data

Similar Documents

Publication Publication Date Title
CN108416227A (en) Big data platform secret protection evaluation method and device based on Dare Information Entropy
CN108304867B (en) Social network-oriented information popularity prediction method and system
Piao et al. Privacy-preserving governmental data publishing: A fog-computing-based differential privacy approach
CN107766745B (en) Hierarchical privacy protection method in hierarchical data release
CN111178408B (en) Health monitoring model construction method and system based on federal random forest learning
CN106846218A (en) A kind of community service end and community service system
CN103902742A (en) Access control determination engine optimization system and method based on big data
Bothe et al. Skyline query processing over encrypted data: An attribute-order-preserving-free approach
Liu et al. Federated personalized random forest for human activity recognition
Reijsbergen et al. {TAP}: Transparent and {Privacy-Preserving} data services
CN118094607B (en) Customer service information service classified storage method and system based on multi-mode large model
CN115309861A (en) Ciphertext retrieval system, method, computer equipment and storage medium
Elabd et al. L–diversity-based semantic anonymaztion for data publishing
Li et al. Incentive and knowledge distillation based federated learning for cross-silo applications
Lin et al. Assessing the impact of differential privacy on population uniques in geographically aggregated data: The case of the 2020 US census
Xu et al. MLPKV: A local differential multi-layer private key-value data collection scheme for edge computing environments
CN110990869A (en) Electric power big data desensitization method applied to privacy protection
CN116776376A (en) A metaverse privacy protection method based on federated learning
CN116467751A (en) Association rule learning method with privacy protection
Wu et al. An ensemble of random decision trees with personalized privacy preservation in edge-cloud computing
Yin et al. Node attributed query access algorithm based on improved personalized differential privacy protection in social network
Wang et al. Intuitionistic fuzzy social network position and role analysis
CN114722064A (en) Sensitive data identification and desensitization method based on presto engine
Xia et al. Hierarchical DP-K anonymous data publishing model based on binary tree
CN113836313B (en) Audit information identification method and system based on map

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180817

RJ01 Rejection of invention patent application after publication