Local Differential Privacy Protection of High-Dimensional Perceptual Data by the Refined Bayes Network
<p>Crowd-Sensing.</p> "> Figure 2
<p>The Bayes network.</p> "> Figure 3
<p>The overview of the local protection on high-dimensional data.</p> "> Figure 4
<p>Data binarization.</p> "> Figure 5
<p>Synthesized dataset <math display="inline"><semantics> <mrow> <msub> <mi>I</mi> <mrow> <mi>sum</mi> </mrow> </msub> </mrow> </semantics></math> vs. <span class="html-italic">f</span>: (<b>a</b>) NLTCS, (<b>b</b>) Adults, (<b>c</b>) TPC-E.</p> "> Figure 6
<p>Mutual information <math display="inline"><semantics> <mrow> <msub> <mi>I</mi> <mrow> <mi>sum</mi> </mrow> </msub> </mrow> </semantics></math> (synthetic dataset) vs. <span class="html-italic">f</span> (k = 2): (<b>a</b>) NLTCS, (<b>b</b>) Adults, (<b>c</b>) TPC-E.</p> "> Figure 7
<p>The accuracy of correlation identification: (<b>a</b>) NLTCS, (<b>b</b>) Adults, (<b>c</b>) TPC-E.</p> "> Figure 8
<p>Probability distribution estimation under different ranges <math display="inline"><semantics> <mrow> <mrow> <mo>|</mo> <mi>Ω</mi> <mo>|</mo> </mrow> </mrow> </semantics></math>: (<b>a</b>) NLTCS (<math display="inline"><semantics> <mrow> <mrow> <mo>|</mo> <mi>Ω</mi> <mo>|</mo> </mrow> <mo>=</mo> <mn>8</mn> </mrow> </semantics></math>), (<b>b</b>) Adults (<math display="inline"><semantics> <mrow> <mrow> <mo>|</mo> <mi>Ω</mi> <mo>|</mo> </mrow> <mo>=</mo> <mn>32</mn> </mrow> </semantics></math>), (<b>c</b>) TPC-E (<math display="inline"><semantics> <mrow> <mrow> <mo>|</mo> <mi>Ω</mi> <mo>|</mo> </mrow> <mo>=</mo> <mn>16</mn> </mrow> </semantics></math>).</p> "> Figure 9
<p>The estimation deviation of<math display="inline"><semantics> <mrow> <mo> </mo> <mi>a</mi> </mrow> </semantics></math>-dimensional joint probability distribution: (<b>a</b>) NLTCS (<math display="inline"><semantics> <mrow> <mi>k</mi> <mo>=</mo> <mn>2</mn> </mrow> </semantics></math>), (<b>b</b>) Adults (<math display="inline"><semantics> <mrow> <mi>k</mi> <mo>=</mo> <mn>2</mn> </mrow> </semantics></math>), (<b>c</b>) TPC-E (<math display="inline"><semantics> <mrow> <mi>k</mi> <mo>=</mo> <mn>2</mn> </mrow> </semantics></math>), (<b>d</b>) NLTCS (<math display="inline"><semantics> <mrow> <mi>k</mi> <mo>=</mo> <mn>2</mn> </mrow> </semantics></math>), (<b>e</b>) Adults (<math display="inline"><semantics> <mrow> <mi>k</mi> <mo>=</mo> <mn>2</mn> </mrow> </semantics></math>), (<b>f</b>) TPC-E (<math display="inline"><semantics> <mrow> <mi>k</mi> <mo>=</mo> <mn>2</mn> </mrow> </semantics></math>).</p> "> Figure 10
<p>SVM classification accuracy by our method: (<b>a</b>) NLTCS, (<b>b</b>) Adults, (<b>c</b>) TPC-E.</p> "> Figure 11
<p>SVM classification accuracy by MeanEST algorithm: (<b>a</b>) NLTCS, (<b>b</b>) Adults, (<b>c</b>) TPC-E.</p> "> Figure 12
<p>SVM classification accuracy by Multi-HM algorithm: (<b>a</b>) NLTCS, (<b>b</b>) Adults, (<b>c</b>) TPC-E.</p> ">
Abstract
:1. Introduction
2. Related Work
3. System Model
4. Basic Knowledge
4.1. Local Differential Privacy
4.2. The Bayes Network
5. The Protection Algorithm of Local Differential Privacy on High-Dimensional Perceptual Data
5.1. Local Privacy Protection
5.1.1. The Algorithm of Local Privacy Protection
Algorithm 1 Data transformation with local differential privacy |
Input: User’s data record, attribute set, random flipping probability f; Output: Randomized binary string of raw data; 1: for do 2: Each user i transform each attribute jth value into a binary string; 3: Randomly flip each bit of to obtain a randomized binary string; 4: end for 5: Concatenate randomized binary strings for all d attributes to obtain. Return:. |
5.1.2. Privacy Analysis
5.2. Synthesis and Publication of High-Dimensional Data after Local Privacy Protection
5.2.1. Conception
5.2.2. The Estimation of the Multidimensional Joint Probability Distribution
Algorithm 2m-dimensional joint probability distribution estimation algorithm |
Input: Attributes index set , randomized binary string, flipping probability f, convergence accuracy, attribute set, attribute domain size; Output: m-dimensional joint probability distribution; 1: Initialize; 2: for each user do 3: for each attribute do 4: Compute ; 5: end for 6: Compute ; 7: end for 8: Set ; 9: repeat 10: for each user do 11: Compute 12: end for 13: Compute; 14: ; 15: until ; Return: . |
- Step E: The calculation of posterior probability. Given the conditional probability of the binary strings of every user, the Bayes probability can be calculated by
- Step M: Iteratively update the parameter. Replace the prior probabilities of the previous round with the average of the posterior probabilities of N users to generate a new k-dimensional joint probability distribution (Algorithm 2, Line 13), and then return to Step E.
5.2.3. Bayes Network Construction
Algorithm 3 Bayes network construction algorithm |
Input: Dataset D, maximal degree of Bayes network k, attribute set A; Output: Bayes network N; 1: Initialize, ; 2: Randomly pick a node from A as and add it into S, add into N; 3: for to d do 4: For and, add into and compute; 5: Choose the attribute-parent pairs with the maximal I; 6: Add into S and add into N; 7: end for Return:. |
5.2.4. The Refined Bayes Network
Algorithm 4 An improved algorithm for Bayes network construction |
Input: Dataset D, degree of Bayes network k, attribute set A; Output: Bayes network N; 1: Initialize , and ; 2: Compute the entropy H for each attribute in A, and choose the attribute with the maximal entropy as add it into S and into N; 3: for to d do 4: ; 5: if then 6: For and, add and into, then cor ; 7: else 8: For and,add into, then compute; 9: end if 10: Choose the attribute-parent pair with the maximal I and denote as; 11: Choose the attribute-parent pair with the maximal value of and denote as; 12: Add into S and into N; 13: end for Return: |
5.3. Synthesis of Dataset
6. Experimental Evaluation
6.1. Experiment Setup
6.1.1. Dataset
6.1.2. Experiment Methods
6.1.3. Experiment Parameters
6.1.4. Evaluation Indicators
6.2. Results
6.2.1. Bayes Network Construction and Attribute Correlation Accuracy
6.2.2. The Accuracy of Statistical Query
6.2.3. Classification Accuracy
7. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Guo, B.; Wang, Z.; Yu, Z.; Wang, Y.; Yen, N.Y.; Huang, R.; Zhou, X. Mobile crowd sensing and computing: The review of an emerging human-powered sensing paradigm. ACM Comput. Surv. 2015, 48, 1–31. [Google Scholar] [CrossRef]
- Yürür, Ö.; Liu, C.H.; Sheng, Z.; Leung, V.C.; Moreno, W.; Leung, K.K. Context-awareness for mobile sensing: A survey and future directions. IEEE Commun. Surv. Tutor. 2014, 18, 68–93. [Google Scholar] [CrossRef] [Green Version]
- Li, G.; Wang, J.; Zheng, Y.; Franklin, M.J. Crowdsourced data management: A survey. IEEE Trans. Knowl. Data Eng. 2016, 28, 2296–2319. [Google Scholar] [CrossRef] [Green Version]
- Mohammed, N.; Chen, R.; Fung, B.; Yu, P.S. Differentially private data release for data mining. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, CA, USA, 21–24 August 2011; pp. 493–501. [Google Scholar]
- Naveed, M.; Ayday, E.; Clayton, E.W.; Fellay, J.; Gunter, C.A.; Hubaux, J.-P.; Malin, B.A.; Wang, X. Privacy in the genomic era. ACM Comput. Surv. 2015, 48, 1–44. [Google Scholar] [CrossRef] [PubMed]
- Kohavi, R.; Provost, F. Applications of data mining to electronic commerce. In Applications of Data Mining to Electronic Commerce; Springer: Berlin/Heidelberg, Germany, 2001; pp. 5–10. [Google Scholar] [CrossRef] [Green Version]
- Clarke, R. What’s ‘Privacy’. Available online: http://www.rogerclarke.com/DV/Privacy.html (accessed on 24 March 2020).
- Sweeney, L.; Abu, A.; Winn, J. Identifying participants in the personal genome project by name (a re-identification experiment). arXiv 2013, arXiv:1304.7605. [Google Scholar]
- Zhou, X.; Demetriou, S.; He, D.; Naveed, M.; Pan, X.; Wang, X.; Gunter, C.A.; Nahrstedt, K. Identity, location, disease and more: Inferring your secrets from android public resources. In Proceedings of the 2013 ACM SIGSAC Conference on Computer & Communications Security, Berlin, Germany, 4–8 November 2013; pp. 1017–1028. [Google Scholar]
- Jin, H.; Su, L.; Ding, B.; Nahrstedt, K.; Borisov, N. Enabling privacy-preserving incentives for mobile crowd sensing systems. In Proceedings of the 2016 IEEE 36th International Conference on Distributed Computing Systems (ICDCS), Nara, Japan, 27–30 June 2016; pp. 344–353. [Google Scholar]
- Sweeney, L. k-anonymity: A model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl. Based Syst. 2002, 10, 557–570. [Google Scholar] [CrossRef] [Green Version]
- Lu, R.; Liang, X.; Li, X.; Lin, X.; Shen, X. EPPA: An efficient and privacy-preserving aggregation scheme for secure smart grid communications. IEEE Trans. Parallel Distrib. Syst. 2012, 23, 1621–1631. [Google Scholar] [CrossRef] [Green Version]
- Mármol, F.G.; Sorge, C.; Ugus, O.; Pérez, G.M. Do not snoop my habits: Preserving privacy in the smart grid. IEEE Commun. Mag. 2012, 50, 166–172. [Google Scholar] [CrossRef]
- Narayanan, A.; Shmatikov, V. Robust de-anonymization of large sparse datasets. In Proceedings of the 2008 IEEE Symposium on Security and Privacy (sp 2008), Oakland, CA, USA, 18–22 May 2008; pp. 111–125. [Google Scholar]
- Dwork, C.; Roth, A. The algorithmic foundations of differential privacy. Found. Trends® Theor. Comput. Sci. 2014, 9, 211–407. [Google Scholar] [CrossRef]
- Ács, G.; Castelluccia, C. I have a dream! (differentially private smart metering). In Proceedings of the International Workshop on Information Hiding, Prague, Czech Republic, 18–20 May 2011; pp. 118–132. [Google Scholar]
- Zhu, T.; Xiong, P.; Li, G.; Zhou, W. Correlated differential privacy: Hiding information in non-IID data set. IEEE Trans. Inf. Forensics Secur. 2014, 10, 229–242. [Google Scholar] [CrossRef]
- McSherry, F.D. Privacy integrated queries: An extensible platform for privacy-preserving data analysis. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data, Providence, RI, USA, 29 June–2 July 2009; pp. 19–30. [Google Scholar]
- Zhang, J.; Cormode, G.; Procopiuc, C.M.; Srivastava, D.; Xiao, X. PrivBayes: Private data release via bayesian networks. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, Snowbird, UT, USA, 22–27 June 2014; pp. 1423–1434. [Google Scholar]
- Chen, R.; Xiao, Q.; Zhang, Y.; Xu, J. Differentially private high-dimensional data publication via sampling-based inference. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, NSW, Australia, 10–13 August 2015; pp. 129–138. [Google Scholar]
- McSherry, F.; Talwar, K. Mechanism design via differential privacy. In Proceedings of the 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS’07), Providence, RI, USA, 21–23 October 2007; pp. 94–103. [Google Scholar]
- Su, D.; Cao, J.; Li, N.; Lyu, M. PrivPfC: Differentially private data publication for classification. VLDB J. 2018, 27, 201–223. [Google Scholar] [CrossRef]
- Zhu, T.; Li, G.; Zhou, W.; Philip, S.Y. Differentially private data publishing and analysis: A survey. IEEE Trans. Knowl. Data Eng. 2017, 29, 1619–1638. [Google Scholar] [CrossRef]
- Qardaji, W.; Yang, W.; Li, N. Priview: Practical differentially private release of marginal contingency tables. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, Snowbird, UT, USA, 22–27 June 2014; pp. 1435–1446. [Google Scholar]
- Liu, K.; Terzi, E. Towards identity anonymization on graphs. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, Vancouver, BC, Canada, 10–12 June 2008; pp. 93–106. [Google Scholar]
- Zhang, X.; Chen, L.; Jin, K.; Meng, X. Private high-dimensional data publication with junction tree. J. Comput. Res. Dev. 2018, 55, 2794–2809. [Google Scholar] [CrossRef]
- Wang, S.-L.; Tsai, Z.-Z.; Hong, T.-P.; Ting, I.-H. Anonymizing shortest paths on social network graphs. In Proceedings of the Asian Conference on Intelligent Information and Database Systems, Daegu, Korea, 20–22 April 2011; pp. 129–136. [Google Scholar]
- Erlingsson, Ú.; Pihur, V.; Korolova, A. RAPPOR: Randomized aggregatable privacy-Preserving ordinal response. In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security, Scottsdale, AZ, USA, 3–7 November 2014; pp. 1054–1067. [Google Scholar]
- Chen, R.; Li, H.; Qin, A.K.; Kasiviswanathan, S.P.; Jin, H. Private spatial data aggregation in the local setting. In Proceedings of the 2016 IEEE 32nd International Conference on Data Engineering (ICDE), Helsinki, Finland, 16–20 May 2016; pp. 289–300. [Google Scholar]
- Cormode, G.; Kulkarni, T.; Srivastava, D. Marginal release under local differential privacy. In Proceedings of the 2018 International Conference on Management of Data, Houston, TX, USA, 10–15 June 2018; pp. 131–146. [Google Scholar]
- Wang, N.; Xiao, X.; Yang, Y.; Hoang, T.D.; Shin, H.; Shin, J.; Yu, G. PrivTrie: Effective frequent term discovery under local differential privacy. In Proceedings of the 2018 IEEE 34th International Conference on Data Engineering (ICDE), Paris, France, 16–19 April 2018; pp. 821–832. [Google Scholar]
- Ye, Q.; Hu, H.; Meng, X.; Zheng, H. PrivKV: Key-value data collection with local differential privacy. In Proceedings of the 2019 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 19–23 May 2019; pp. 317–331. [Google Scholar]
- Zhang, Z.; Wang, T.; Li, N.; He, S.; Chen, J. Calm: Consistent adaptive local marginal for marginal release under local differential privacy. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, Toronto, ON, Canada, 15–19 October 2018; pp. 212–229. [Google Scholar]
- Xiong, S.; Sarwate, A.D.; Mandayam, N.B. Randomized requantization with local differential privacy. In Proceedings of the 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Shanghai, China, 20–25 March 2016; pp. 2189–2193. [Google Scholar]
- Sarwate, A.D.; Sankar, L. A rate-disortion perspective on local differential privacy. In Proceedings of the 2014 52nd Annual Allerton Conference on Communication, Control, and Computing (Allerton), Monticello, IL, USA, 30 September–3 October 2014; pp. 903–908. [Google Scholar]
- Warner, S.L. Randomized response: A survey technique for eliminating evasive answer bias. J. Am. Stat. Assoc. 1965, 60, 63–69. [Google Scholar] [CrossRef] [PubMed]
- Kairouz, P.; Bonawitz, K.; Ramage, D. Discrete distribution estimation under local privacy. In Proceedings of the 33rd International Conference on Machine Learning, Proceedings of Machine Learning Research, New York, NY, USA, 20–22 June 2016; pp. 2436–2444. [Google Scholar]
- Kairouz, P.; Oh, S.; Viswanath, P. Extremal mechanisms for local differential privacy. In Advances in Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 2014; Available online: https://papers.nips.cc/paper/5392-extremal-mechanisms-for-local-differential-privacy.pdf (accessed on 24 March 2020).
- Wang, S.; Huang, L.; Wang, P.; Nie, Y.; Xu, H.; Yang, W.; Li, X.-Y.; Qiao, C. Mutual information optimally local private discrete distribution estimation. arXiv 2016, arXiv:1607.08025. [Google Scholar]
- Qin, Z.; Yu, T.; Yang, Y.; Khalil, I.; Xiao, X.; Ren, K. Generating Synthetic Decentralized Social Graphs with Local Differential Privacy. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, Dallas, TX, USA, 30 October–3 November 2017; pp. 425–438. [Google Scholar]
- Qin, Z.; Yang, Y.; Yu, T.; Khalil, I.; Xiao, X.; Ren, K. Heavy Hitter Estimation over Set-Valued Data with Local Differential Privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, Vienna, Austria, 24–28 October 2016; pp. 192–203. [Google Scholar]
- Hay, M.; Li, C.; Miklau, G.; Jensen, D. Accurate estimation of the degree distribution of private networks. In Proceedings of the 2009 Ninth IEEE International Conference on Data Mining, Miami, FL, USA, 6–9 December 2009; pp. 169–178. [Google Scholar]
- Dwork, C. Differential Privacy. In Proceedings of the Automata, Languages and Programming, Venice, Italy, 10–14 July 2006; Springer: Berlin/Heidelberg, Germany, 2006; pp. 1–12. [Google Scholar]
- Duchi, J.C.; Jordan, M.I.; Wainwright, M.J. Local privacy and statistical minimax rates. In Proceedings of the 2013 IEEE 54th Annual Symposium on Foundations of Computer Science, Berkeley, CA, USA, 26–29 October 2013; pp. 429–438. [Google Scholar]
- Fanti, G.; Pihur, V.; Erlingsson, Ú. Building a rappor with the unknown: Privacy-preserving learning of associations and data dictionaries. Proc. Priv. Enhancing Technol. 2016, 41–61. [Google Scholar] [CrossRef] [Green Version]
- Ren, X.; Yu, C.-M.; Yu, W.; Yang, S.; Yang, X.; McCann, J.A.; Philip, S.Y. LoPub: High-Dimensional Crowdsourced Data Publication with Local Differential Privacy. IEEE Trans. Inf. Forensics Secur. 2018, 13, 2151–2166. [Google Scholar] [CrossRef] [Green Version]
- Zhang, J.; Cormode, G.; Procopiuc, C.M.; Srivastava, D.; Xiao, X. PrivBayes: Private data release via Bayesian networks. ACM Trans. Database Syst. 2017, 42, 25. [Google Scholar] [CrossRef]
- Acs, G.; Castelluccia, C.; Chen, R. Differentially private histogram publishing through lossy compression. In Proceedings of the 2012 IEEE 12th International Conference on Data Mining, Brussels, Belgium, 10–13 December 2012; pp. 1–10. [Google Scholar]
- Duchi, J.C.; Jordan, M.I.; Wainwright, M.J. Minimax optimal procedures for locally private estimation. J. Amer. Stat. Assoc. 2018, 113, 182–201. [Google Scholar] [CrossRef]
- Wang, N.; Xiao, X.; Yang, Y.; Zhao, J.; Hui, S.C.; Shin, H.; Shin, J.; Yu, G. Collecting and analyzing multidimensional data with local differential privacy. In Proceedings of the 2019 IEEE 35th International Conference on Data Engineering (ICDE), Macao, China, 8–11 April 2019; pp. 638–649. [Google Scholar]
Attribute | Attribute-Parent Pairs |
---|---|
() | |
() | |
() | |
() | |
() |
Dataset | Data Type | Dataset Size | Dimension | Domain Size (Processed) |
---|---|---|---|---|
NLTCS | Binary | 25,174 | 16 | |
Adult | Non-Binary | 45,222 | 15 | |
TPC-E | Non-Binary | 40,000 | 24 |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ju, C.; Gu, Q.; Wu, G.; Zhang, S. Local Differential Privacy Protection of High-Dimensional Perceptual Data by the Refined Bayes Network. Sensors 2020, 20, 2516. https://doi.org/10.3390/s20092516
Ju C, Gu Q, Wu G, Zhang S. Local Differential Privacy Protection of High-Dimensional Perceptual Data by the Refined Bayes Network. Sensors. 2020; 20(9):2516. https://doi.org/10.3390/s20092516
Chicago/Turabian StyleJu, Chunhua, Qiuyang Gu, Gongxing Wu, and Shuangzhu Zhang. 2020. "Local Differential Privacy Protection of High-Dimensional Perceptual Data by the Refined Bayes Network" Sensors 20, no. 9: 2516. https://doi.org/10.3390/s20092516
APA StyleJu, C., Gu, Q., Wu, G., & Zhang, S. (2020). Local Differential Privacy Protection of High-Dimensional Perceptual Data by the Refined Bayes Network. Sensors, 20(9), 2516. https://doi.org/10.3390/s20092516