Computer Science > Machine Learning

arXiv:2212.00484 (cs)

[Submitted on 1 Dec 2022 (v1), last revised 23 Apr 2024 (this version, v3)]

Title:Differentially-Private Data Synthetisation for Efficient Re-Identification Risk Control

Authors:Tânia Carvalho, Nuno Moniz, Luís Antunes, Nitesh Chawla

Abstract:Protecting user data privacy can be achieved via many methods, from statistical transformations to generative models. However, all of them have critical drawbacks. For example, creating a transformed data set using traditional techniques is highly time-consuming. Also, recent deep learning-based solutions require significant computational resources in addition to long training phases, and differentially private-based solutions may undermine data utility. In this paper, we propose $\epsilon$-PrivateSMOTE, a technique designed for safeguarding against re-identification and linkage attacks, particularly addressing cases with a high \sloppy re-identification risk. Our proposal combines synthetic data generation via noise-induced interpolation with differential privacy principles to obfuscate high-risk cases. We demonstrate how $\epsilon$-PrivateSMOTE is capable of achieving competitive results in privacy risk and better predictive performance when compared to multiple traditional and state-of-the-art privacy-preservation methods, including generative adversarial networks, variational autoencoders, and differential privacy baselines. We also show how our method improves time requirements by at least a factor of 9 and is a resource-efficient solution that ensures high performance without specialised hardware.

Comments:	21 pages, 6 figures and 2 tables
Subjects:	Machine Learning (cs.LG); Cryptography and Security (cs.CR)
Cite as:	arXiv:2212.00484 [cs.LG]
	(or arXiv:2212.00484v3 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2212.00484

Submission history

From: Tânia Carvalho [view email]
[v1] Thu, 1 Dec 2022 13:20:37 UTC (874 KB)
[v2] Fri, 29 Sep 2023 10:00:07 UTC (794 KB)
[v3] Tue, 23 Apr 2024 16:22:07 UTC (1,913 KB)

Computer Science > Machine Learning

Title:Differentially-Private Data Synthetisation for Efficient Re-Identification Risk Control

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Differentially-Private Data Synthetisation for Efficient Re-Identification Risk Control

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators