More Web Proxy on the site http://driver.im/

research-article

Open access

DPM: Clustering Sensitive Data through Separation

Authors:

Johannes Liebenow,

Florian Thaeter,

Esfandiar MohammadiAuthors Info & Claims

CCS '24: Proceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security

Pages 273 - 287

https://doi.org/10.1145/3658644.3690271

Published: 09 December 2024 Publication History

Abstract

Clustering is an important tool for data exploration where the goal is to subdivide a data set into disjoint clusters that fit well into the underlying data structure. When dealing with sensitive data, privacy-preserving algorithms aim to approximate the non-private baseline while minimising the leakage of sensitive information. State-of-the-art privacy-preserving clustering algorithms tend to output clusters that are good in terms of the standard metrics, inertia, silhouette score, and clustering accuracy, however, the clustering result strongly deviates from the non-private KMeans baseline.

In this work, we present a privacy-preserving clustering algorithm called DPM that recursively separates a data set into clusters based on a geometrical clustering approach. In addition, DPM estimates most of the data-dependent hyper-parameters in a privacy-preserving way. We prove that DPM preserves Differential Privacy and analyse the utility guarantees of DPM. Finally, we conduct an extensive empirical evaluation for synthetic and real-life data sets. We show that DPM achieves state-of-the-art utility on the standard clustering metrics and yields a clustering result much closer to that of the popular non-private KMeans algorithm without requiring the number of classes.

References

[1]

Maria-Florina Balcan, Travis Dick, Yingyu Liang, Wenlong Mou, and Hongyang Zhang. 2017. Differentially private clustering in high-dimensional euclidean spaces. In International Conference on Machine Learning. PMLR, 322--331.

[2]

Alisa Chang, Badih Ghazi, Ravi Kumar, and Pasin Manurangsi. 2021. Locally private k-means in one round. In ICML. PMLR, 1441--1451.

[3]

Kamalika Chaudhuri, Claire Monteleoni, and Anand D Sarwate. 2011. Differentially private empirical risk minimization. JMLR 12, 3 (2011).

[4]

E. Cohen, H. Kaplan, Y. Mansour, U. Stemmer, and E. Tsfadia. 2021. Differentiallyprivate clustering of easy instances. In ICML. PMLR, 2049--2059.

[5]

Wenxin Du, Canyon Foot, Monica Moniot, Andrew Bray, and Adam Groce. 2020. Differentially private confidence intervals. arXiv preprint arXiv:2001.02285 (2020).

[6]

Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. 2006. Calibrating Noise to Sensitivity in Private Data Analysis. In Theory of Cryptography, Shai Halevi and Tal Rabin (Eds.). Springer, 265--284.

[7]

Cynthia Dwork, Aaron Roth, et al. 2014. The algorithmic foundations of differential privacy. Foundations and Trends in Theoretical Computer Science 9 (2014).

[8]

D. Feldman, C. Xiang, R. Zhu, and D. Rus. 2017. Coresets for differentially private k-means clustering and applications to privacy in mobile sensor networks. In Proceedings of the 16th ACM/IEEE IPSN.

[9]

B. Ghazi, R. Kumar, and P. Manurangsi. 2020. Differentially private clustering: Tight approximation ratios. Neurips (2020).

[10]

Google. 2021. Google's differential privacy libraries. https://github.com/google/ differential-privacy/

[11]

Alexander Hinneburg and Daniel A Keim. 1999. Optimal grid-clustering: Towards breaking the curse of dimensionality in high-dimensional clustering. (1999).

[12]

N. Holohan, S. Braghin, Pól M. Aonghusa, and K. Levacher. 2019. Diffprivlib: the IBM differential privacy library. ArXiv e-prints 1907.02444 [cs.CR] (July 2019).

[13]

Zhiyi Huang and Jinyan Liu. 2018. Optimal differentially private algorithms for k-means clustering. In Proceedings of the 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems. 395--408.

Digital Library

[14]

Roger Iyengar, Joseph P Near, Dawn Song, Om Thakkar, Abhradeep Thakurta, and LunWang. 2019. Towards practical differentially private convex optimization. In IEEE S&P. IEEE, 299--316.

[15]

Matthew Jones, Huy L. Nguyen, and Thy D Nguyen. 2021. Differentially Private Clustering via Maximum Coverage. Proceedings of the AAAI Conference on Artificial Intelligence 35, 13 (2021).

[16]

K. LeFevre, D. DeWitt, and R. Ramakrishnan. 2006. Mondrian multidimensional k-anonymity. In 22nd ICDE'06. IEEE, 25--25.

[17]

Samuel Maddock, Graham Cormode, TianhaoWang, Carsten Maple, and Somesh Jha. 2022. Federated Boosted Decision Trees with Differential Privacy. In Proceedings of the 2022 ACM SIGSAC CCS. ACM.

Digital Library

[18]

Huy L. Nguyen, Anamay Chaturvedi, and Eric Z Xu. 2021. Differentially Private k-Means via Exponential Mechanism and Max Cover. Proceedings of the AAAI Conference on Artificial Intelligence 35, 10 (5 2021), 9101--9108.

[19]

Tianjiao Ni, Minghao Qiao, Zhili Chen, Shun Zhang, and Hong Zhong. 2021. Utility-efficient differentially private K-means clustering based on cluster merging. Neurocomputing 424 (2021), 205--214.

[20]

Kobbi Nissim and Uri Stemmer. 2018. Clustering algorithms for the centralized and local models. In Algorithmic Learning Theory. PMLR, 619--653.

[21]

Kobbi Nissim, Uri Stemmer, and Salil Vadhan. 2016. Locating a small cluster privately. In Proceedings of the 35th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems. 413--427.

Digital Library

[22]

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825--2830.

Digital Library

[23]

M. Shechner, O. Sheffet, and U. Stemmer. 2020. Private k-means clustering with stability assumptions. In AISTATS. PMLR, 2518--2528.

[24]

Uri Stemmer. 2021. Locally private k-means clustering. Journal of Machine Learning Research 22, 176 (2021), 1--30.

[25]

Uri Stemmer and Haim Kaplan. 2018. Differentially private k-means with constant multiplicative error. Advances in Neural Information Processing Systems 31 (2018).

[26]

Dong Su, Jianneng Cao, Ninghui Li, Elisa Bertino, and Hongxia Jin. 2016. Differentially private k-means clustering. In Proceedings of the sixth ACM conference on data and application security and privacy. 26--37.

Digital Library

[27]

Jun Zhang, Xiaokui Xiao, and Xing Xie. 2016. PrivTree: A Differentially Private Algorithm for Hierarchical Decompositions. CoRR (2016). arXiv:1601.03229

Digital Library

[28]

Yaling Zhang, Na Liu, and Shangping Wang. 2018. A differential privacy protecting K-means clustering algorithm based on contour coefficients. PloS one 13, 11 (2018), e0206832.

Index Terms

DPM: Clustering Sensitive Data through Separation
1. Security and privacy
2. Theory of computation

Recommendations

Differential Privacy Protection Algorithm for Data Clustering Center
Machine Learning for Cyber Security
Abstract
With the rapid development of the information era, data mining is widely used in various fields as a technique that can discover the interrelationship between data, but it also inevitably brings some privacy leakage problems. Differential privacy ...
Achieving anonymity via clustering

Publishing data for analysis from a table containing personal records, while maintaining individual privacy, is a problem of increasing importance today. The traditional approach of deidentifying records is to remove identifying fields such as social ...
A High-Availability K-modes Clustering Method Based on Differential Privacy
Algorithms and Architectures for Parallel Processing
Abstract
In categorical data mining, the K-modes algorithm is a classic algorithm that has been widely used. However, the data analyzed by the K-modes algorithm usually contains sensitive user information. If these data are leaked, it will seriously ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

CCS '24: Proceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security

December 2024

5188 pages

ISBN:9798400706363

DOI:10.1145/3658644

General Chairs:
Bo Luo
University of Kansas, USA
,
Xiaojing Liao
Indiana University Bloomington, USA
,
Jun Xu
University of Utah, USA
,
Program Chairs:
Engin Kirda
Northeastern University, USA
,
David Lie
University of Toronto, Canada

Copyright © 2024 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

SIGSAC: ACM Special Interest Group on Security, Audit, and Control

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 December 2024

Check for updates

Badges

Author Tags

Qualifiers

Research-article

Conference

CCS '24

Sponsor:

SIGSAC

CCS '24: ACM SIGSAC Conference on Computer and Communications Security

October 14 - 18, 2024

UT, Salt Lake City, USA

Acceptance Rates

Overall Acceptance Rate 1,261 of 6,999 submissions, 18%

Upcoming Conference

CCS '25

Sponsor:
sigsac

ACM SIGSAC Conference on Computer and Communications Security

October 13 - 17, 2025

Taipei , Taiwan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
60
Total Downloads

Downloads (Last 12 months)60
Downloads (Last 6 weeks)60

Reflects downloads up to 24 Dec 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents