More Web Proxy on the site http://driver.im/

research-article

Open access

Measuring Re-identification Risk

Authors:

Alessandro Epasto,

Adel Javanmard,

Andres Muñoz Medina,

Vahab Mirrokni,

Gabriel Henrique Nunes,

Sergei Vassilvitskii,

Peilin ZhongAuthors Info & Claims

Proceedings of the ACM on Management of Data, Volume 1, Issue 2

Article No.: 149, Pages 1 - 26

https://doi.org/10.1145/3589294

Published: 20 June 2023 Publication History

Editorial Notes

The authors have requested minor, non-substantive changes to the Version of Record and, in accordance with ACM policies, a Corrected Version of Record was published on August 10, 2023. For reference purposes, the VoR may still be accessed via the Supplemental Material section on this page.

Abstract

Compact user representations (such as embeddings) form the backbone of personalization services. In this work, we present a new theoretical framework to measure re-identification risk in such user representations. Our framework, based on hypothesis testing, formally bounds the probability that an attacker may be able to obtain the identity of a user from their representation. As an application, we show how our framework is general enough to model important real-world applications such as the Chrome's Topics API for interest-based advertising. We complement our theoretical bounds by showing provably good attack algorithms for re-identification that we use to estimate the re-identification risk in the Topics API. We believe this work provides a rigorous and interpretable notion of re-identification risk and a framework to measure it that can be used to inform real-world applications.

Supplemental Material

MP4 File

Compact user representations (such as embeddings) form the backbone of personalization services. In this work, we present a new theoretical framework to measure re-identification risk in such user representations. Our framework, based on hypothesis testing, formally bounds the probability that an attacker may be able to obtain the identity of a user from their representation. As an application, we show how our framework is general enough to model important real-world applications such as the Chrome's Topics API for interest-based advertising. We complement our theoretical bounds by showing provably good attack algorithms for re-identification that we use to estimate the re-identification risk in the Topics API. We believe this work provides a rigorous and interpretable notion of re-identification risk and a framework to measure it that can be used to inform real-world applications.

Download
46.05 MB

PDF File

Version of Record for "Measuring Re-identification Risk" by Carey et al., Proceedings of the ACM on Management of Data, Volume 1, Issue 2 (PACMMOD 1:2).

Download
880.77 KB

References

[1]

Martin Abadi, Andy Chu, Ian Goodfellow, H Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. 2016. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC conference on computer and communications security. ACM, 308--318.

Digital Library

[2]

Gagan Aggarwal, Tomas Feder, Krishnaram Kenthapadi, Rajeev Motwani, Rina Panigrahy, Dilys Thomas, and An Zhu. 2005. Approximation Algorithms for k-Anonymity, In Proceedings of the International Conference on Database Theory (ICDT 2005). Journal of Privacy Technology (JOPT). http://ilpubs.stanford.edu:8090/645/

[3]

Rakesh Agrawal and Ramakrishnan Srikant. 2000. Privacy-preserving data mining. In Proceedings of the 2000 ACM SIGMOD international conference on Management of data. 439--450.

Digital Library

[4]

Má rio S. Alvim, Konstantinos Chatzikokolakis, Annabelle McIver, Carroll Morgan, Catuscia Palamidessi, and Geoffrey Smith. 2020. The Science of Quantitative Information Flow. Springer. https://doi.org/10.1007/978--3--319--96131--6

[5]

Ali Ismail Awad. 2012. Machine Learning Techniques for Fingerprint Identification: A Short Review. In Advanced Machine Learning Technologies and Applications - First International Conference, AMLTA 2012, Cairo, Egypt, December 8--10, 2012. Proceedings (Communications in Computer and Information Science, Vol. 322), Aboul Ella Hassanien, Abdel-Badeeh M. Salem, Rabie A. Ramadan, and Tai-Hoon Kim (Eds.). Springer, 524--531.

[6]

Dzmitry Bahdanau, Kyung Hyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In ICLR.

[7]

Muhammad Ahmad Bashir and Christo Wilson. 2018. Diffusion of User Tracking Data in the Online Advertising Ecosystem. Proc. Priv. Enhancing Technol., Vol. 2018, 4 (2018), 85--103.

[8]

Yoshua Bengio, Aaron C. Courville, and Pascal Vincent. 2013. Representation Learning: A Review and New Perspectives. IEEE Trans. Pattern Anal. Mach. Intell., Vol. 35, 8 (2013), 1798--1828.

Digital Library

[9]

Thierry Bertin-Mahieux, Daniel P.W. Ellis, Brian Whitman, and Paul Lamere. 2011. The Million Song Dataset. In 12th International Society for Music Information Retrieval Conference (ISMIR 2011).

[10]

Richard Blahut. 1974. Hypothesis testing and information theory. IEEE Transactions on Information Theory, Vol. 20, 4 (1974), 405--417.

Digital Library

[11]

Kamalika Chaudhuri, Claire Monteleoni, and Anand D Sarwate. 2011. Differentially private empirical risk minimization. Journal of Machine Learning Research, Vol. 12, 3 (2011).

[12]

Mia Xu Chen, Orhan Firat, Ankur Bapna, Melvin Johnson, Wolfgang Macherey, George Foster, Llion Jones, Mike Schuster, Noam Shazeer, Niki Parmar, Ashish Vaswani, Jakob Uszkoreit, Lukasz Kaiser, Zhifeng Chen, Yonghui Wu, and Macduff Hughes. 2018. The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Melbourne, Australia, 76--86. https://doi.org/10.18653/v1/P18--1008

[13]

Aloni Cohen and Kobbi Nissim. 2020. Towards formalizing the GDPR's notion of singling out. Proceedings of the National Academy of Sciences, Vol. 117, 15 (2020), 8344--8352.

[14]

Graham Cormode, Somesh Jha, Tejas Kulkarni, Ninghui Li, Divesh Srivastava, and Tianhao Wang. 2018. Privacy at scale: Local differential privacy in practice. In Proceedings of the 2018 International Conference on Management of Data. 1655--1658.

Digital Library

[15]

Thomas M. Cover and Joy A. Thomas. 2001. Elements of Information Theory. Wiley.

[16]

Mario Diaz, Hao Wang, Flavio P. Calmon, and Lalitha Sankar. 2020. On the Robustness of Information-Theoretic Privacy Measures and Mechanisms. IEEE Transactions on Information Theory, Vol. 66, 4 (2020), 1949--1978.

[17]

Cynthia Dwork. 2008. Differential Privacy: A Survey of Results. In Theory and Applications of Models of Computation, 5th International Conference, TAMC 2008, Xi'an, China, April 25--29, 2008. Proceedings (Lecture Notes in Computer Science, Vol. 4978), Manindra Agrawal, Ding-Zhu Du, Zhenhua Duan, and Angsheng Li (Eds.). Springer, 1--19.

[18]

Cynthia Dwork. 2019. Differential privacy and the US census. In Proceedings of the 38th ACM SIGMOD-SIGACT-SIGAI symposium on principles of database systems. 1--1.

Digital Library

[19]

Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. 2006. Calibrating noise to sensitivity in private data analysis. In Theory of cryptography conference. Springer, 265--284.

[20]

Peter Eckersley. 2010. How Unique is Your Web Browser?. In Proceedings of the 10th International Conference on Privacy Enhancing Technologies (Berlin, Germany). Springer-Verlag, Berlin, Heidelberg, 1--18.

Digital Library

[21]

Alessandro Epasto, André s Mu n oz Medina, Steven Avery, Yijian Bai, Ró bert Busa-Fekete, C. J. Carey, Ya Gao, David Guthrie, Subham Ghosh, James Ioannidis, Junyi Jiao, Jakub Lacki, Jason Lee, Arne Mauser, Brian Milch, Vahab S. Mirrokni, Deepak Ravichandran, Wei Shi, Max Spero, Yunting Sun, Umar Syed, Sergei Vassilvitskii, and Shuo Wang. 2021. Clustering for Private Interest-based Advertising. In KDD '21: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Virtual Event, Singapore, August 14--18, 2021, Feida Zhu, Beng Chin Ooi, and Chunyan Miao (Eds.). ACM, 2802--2810.

[22]

Alessandro Epasto, Andres Munoz Medina, Christina Ilvento, and Josh Karlin. 2022. Measures of Cross-Site Re-Identification Risk: an Analysis of the Topics API Proposal. https://github.com/patcg-individual-drafts/topics/blob/main/topics_analysis.pdf.

[23]

Alexandre Evfimievski, Johannes Gehrke, and Ramakrishnan Srikant. 2003. Limiting privacy breaches in privacy preserving data mining. In Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems. 211--222.

Digital Library

[24]

Alejandro Gó mez-Boix, Pierre Laperdrix, and Benoit Baudry. 2018. Hiding in the Crowd: an Analysis of the Effectiveness of Browser Fingerprinting at Large Scale. In Proceedings of the 2018 World Wide Web Conference on World Wide Web, WWW 2018, Lyon, France, April 23--27, 2019, Pierre-Antoine Champin, Fabien Gandon, Mounia Lalmas, and Panagiotis G. Ipeirotis (Eds.). ACM, 309--318.

[25]

Google. 2022a. The Privacy Sandbox. Available at https://privacysandbox.com/intl/en_us/ (Accessed October 15, 2022) (2022).

[26]

Google. 2022b. The Topics API. Available at https://github.com/patcg-individual-drafts/topics (Accessed October 15, 2022) (2022).

[27]

Jonathan Halcrow, Alexandru Mosoi, Sam Ruth, and Bryan Perozzi. 2020. Grale: Designing networks for graph learning. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2523--2532.

Digital Library

[28]

Hsiang Hsu, Natalia Martinez, Martin Bertran, Guillermo Sapiro, and Flavio P. Calmon. 2021. A Survey on Statistical, Information, and Estimation-Theoretic Views on Privacy. IEEE BITS the Information Theory Magazine, Vol. 1, 1 (2021), 45--56.

[29]

Shouling Ji, Prateek Mittal, and Raheem Beyah. 2016. Graph data anonymization, de-anonymization attacks, and de-anonymizability quantification: A survey. IEEE Communications Surveys & Tutorials, Vol. 19, 2 (2016), 1305--1326.

Digital Library

[30]

Batya Kenig and Tamir Tassa. 2012. A practical approximation algorithm for optimal k-anonymity. Data Mining and Knowledge Discovery, Vol. 25, 1 (2012), 134--168.

Digital Library

[31]

Pierre Laperdrix, Nataliia Bielova, Benoit Baudry, and Gildas Avoine. 2020. Browser Fingerprinting: A Survey. ACM Transactions on the Web, Vol. 14, 2 (2020).

Digital Library

[32]

Kristen LeFevre, David J DeWitt, and Raghu Ramakrishnan. 2006. Mondrian multidimensional k-anonymity. In 22nd International conference on data engineering (ICDE'06). IEEE, 25--25.

Digital Library

[33]

Ashwin Machanavajjhala, Xi He, and Michael Hay. 2017. Differential privacy in the wild: A tutorial on current practices & open challenges. In Proceedings of the 2017 ACM International Conference on Management of Data. 1727--1730.

Digital Library

[34]

Frank McSherry and Kunal Talwar. 2007. Mechanism design via differential privacy. In 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07). IEEE, 94--103.

Digital Library

[35]

Adam Meyerson and Ryan Williams. 2004. On the complexity of optimal k-anonymity. In Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems. 223--228.

Digital Library

[36]

?ukasz Olejnik, Gunes Acar, Claude Castelluccia, and Claudia Diaz. 2015. The leaking battery. In Data Privacy Management, and Security Assurance. Springer, 254--263.

[37]

Hyoungmin Park and Kyuseok Shim. 2010. Approximate algorithms with generalizing attribute values for k-anonymity. Information Systems, Vol. 35, 8 (2010), 933--955.

Digital Library

[38]

Amrita Roy Chowdhury, Chenghong Wang, Xi He, Ashwin Machanavajjhala, and Somesh Jha. 2020. Crypte: Crypto-assisted differential privacy on untrusted servers. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 603--619.

Digital Library

[39]

Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov. 2017. Membership inference attacks against machine learning models. In 2017 IEEE symposium on security and privacy (SP). IEEE, 3--18.

[40]

Geoffrey Smith. 2009. On the foundations of quantitative information flow. In International Conference on Foundations of Software Science and Computational Structures. Springer, 288--302.

[41]

Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to sequence learning with neural networks. In Advances in neural information processing systems. 3104--3112.

[42]

Latanya Sweeney. 2002. k-Anonymity: A Model for Protecting Privacy. Int. J. Uncertain. Fuzziness Knowl. Based Syst., Vol. 10, 5 (2002), 557--570.

Digital Library

[43]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, ?ukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates, Inc., 5998--6008. http://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf

Digital Library

[44]

Sergio Verdú. 2015. ?-mutual information. In 2015 Information Theory and Applications Workshop (ITA). 1--6. https://doi.org/10.1109/ITA.2015.7308959

[45]

Vassilios S Verykios, Elisa Bertino, Igor Nai Fovino, Loredana Parasiliti Provenza, Yucel Saygin, and Yannis Theodoridis. 2004. State-of-the-art in privacy preserving data mining. ACM Sigmod Record, Vol. 33, 1 (2004), 50--57.

Digital Library

[46]

Weina Wang, Lei Ying, and Junshan Zhang. 2016. On the Relation Between Identifiability, Differential Privacy, and Mutual-Information Privacy. IEEE Transactions on Information Theory, Vol. 62, 9 (2016), 5018--5029.

Digital Library

[47]

Genqiang Wu, Xianyao Xia, and Yeping He. 2017. Extending Differential Privacy for Treating Dependent Records via Information Theory. CoRR, Vol. abs/1703.07474 (2017). http://arxiv.org/abs/1703.07474

[48]

Mengmeng Yang, Lingjuan Lyu, Jun Zhao, Tianqing Zhu, and Kwok-Yan Lam. 2020. Local Differential Privacy and Its Applications: A Comprehensive Survey. CoRR, Vol. abs/2008.03686 (2020). https://arxiv.org/abs/2008.03686

Cited By

Verna AJha NTrevisan MMellia MGopalakrishnan VWang JAyyub Qazi ITyson G(2024)A First View of Topics API Usage in the WildProceedings of the 20th International Conference on emerging Networking EXperiments and Technologies10.1145/3680121.3697810(48-54)Online publication date: 9-Dec-2024
https://dl.acm.org/doi/10.1145/3680121.3697810
Jha NTrevisan MLeonardi EMellia M(2024)Re-Identification Attacks against the Topics APIACM Transactions on the Web10.1145/367540018:3(1-24)Online publication date: 16-Aug-2024
https://dl.acm.org/doi/10.1145/3675400
Wu Y(2024)Supply Chain Financing Risk Identification and Control under the Internet of Things Financial ModelProceedings of the 2024 3rd International Conference on Cyber Security, Artificial Intelligence and Digital Economy10.1145/3672919.3672926(34-38)Online publication date: 1-Mar-2024
https://dl.acm.org/doi/10.1145/3672919.3672926
Show More Cited By

Index Terms

Measuring Re-identification Risk
1. Computing methodologies
  1. Machine learning
2. Security and privacy
  1. Human and societal aspects of security and privacy
    1. Privacy protections

Recommendations

A unified framework for evaluating the risk of re-identification of text de-identification tools

Display Omitted Existing methods for evaluating text de-identification tools do not take into account the distribution of identifiers in documents.We propose an evaluation framework that addresses these shortcomings.The framework is used to evaluate a ...
WristPrint: Characterizing User Re-identification Risks from Wrist-worn Accelerometry Data
CCS '21: Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security

Public release of wrist-worn motion sensor data is growing. They enable and accelerate research in developing new algorithms to passively track daily activities, resulting in improved health and wellness utilities of smartwatches and activity trackers. ...
How Adversarial Assumptions Influence Re-identification Risk Measures: A COVID-19 Case Study
Privacy in Statistical Databases
Abstract
The COVID-19 pandemic highlights the need for broad dissemination of case surveillance data. Local and global public health agencies have initiated efforts to do so, but there remains limited data available, due in part to concerns over privacy. ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Management of Data

Proceedings of the ACM on Management of Data Volume 1, Issue 2

PACMMOD

June 2023

2310 pages

EISSN:2836-6573

DOI:10.1145/3605748

Editor:
Divyakant Agrawal
UC Santa Barbara, United States

Issue’s Table of Contents

Copyright © 2023 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 June 2023

Published in PACMMOD Volume 1, Issue 2

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
385
Total Downloads

Downloads (Last 12 months)317
Downloads (Last 6 weeks)64

Reflects downloads up to 10 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Verna AJha NTrevisan MMellia MGopalakrishnan VWang JAyyub Qazi ITyson G(2024)A First View of Topics API Usage in the WildProceedings of the 20th International Conference on emerging Networking EXperiments and Technologies10.1145/3680121.3697810(48-54)Online publication date: 9-Dec-2024
https://dl.acm.org/doi/10.1145/3680121.3697810
Jha NTrevisan MLeonardi EMellia M(2024)Re-Identification Attacks against the Topics APIACM Transactions on the Web10.1145/367540018:3(1-24)Online publication date: 16-Aug-2024
https://dl.acm.org/doi/10.1145/3675400
Wu Y(2024)Supply Chain Financing Risk Identification and Control under the Internet of Things Financial ModelProceedings of the 2024 3rd International Conference on Cyber Security, Artificial Intelligence and Digital Economy10.1145/3672919.3672926(34-38)Online publication date: 1-Mar-2024
https://dl.acm.org/doi/10.1145/3672919.3672926
Alvim MFernandes NMcIver ANunes GLuo BLiao XXu JKirda ELie D(2024)The Privacy-Utility Trade-off in the Topics APIProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security10.1145/3658644.3670368(1106-1120)Online publication date: 2-Dec-2024
https://dl.acm.org/doi/10.1145/3658644.3670368
Beugin YMcDaniel P(2024)A Public and Reproducible Assessment of the Topics API on Real Data2024 IEEE Security and Privacy Workshops (SPW)10.1109/SPW63631.2024.00005(1-8)Online publication date: 23-May-2024
https://doi.org/10.1109/SPW63631.2024.00005
Alvim MFernandes NMcIver ANunes GKnijnenburg BPapadimitratos P(2023)A Quantitative Information Flow Analysis of the Topics APIProceedings of the 22nd Workshop on Privacy in the Electronic Society10.1145/3603216.3624959(123-127)Online publication date: 26-Nov-2023
https://dl.acm.org/doi/10.1145/3603216.3624959

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents