[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article
Open access

Measuring Re-identification Risk

Published: 20 June 2023 Publication History

Editorial Notes

The authors have requested minor, non-substantive changes to the Version of Record and, in accordance with ACM policies, a Corrected Version of Record was published on August 10, 2023. For reference purposes, the VoR may still be accessed via the Supplemental Material section on this page.

Abstract

Compact user representations (such as embeddings) form the backbone of personalization services. In this work, we present a new theoretical framework to measure re-identification risk in such user representations. Our framework, based on hypothesis testing, formally bounds the probability that an attacker may be able to obtain the identity of a user from their representation. As an application, we show how our framework is general enough to model important real-world applications such as the Chrome's Topics API for interest-based advertising. We complement our theoretical bounds by showing provably good attack algorithms for re-identification that we use to estimate the re-identification risk in the Topics API. We believe this work provides a rigorous and interpretable notion of re-identification risk and a framework to measure it that can be used to inform real-world applications.

Supplemental Material

MP4 File
Compact user representations (such as embeddings) form the backbone of personalization services. In this work, we present a new theoretical framework to measure re-identification risk in such user representations. Our framework, based on hypothesis testing, formally bounds the probability that an attacker may be able to obtain the identity of a user from their representation. As an application, we show how our framework is general enough to model important real-world applications such as the Chrome's Topics API for interest-based advertising. We complement our theoretical bounds by showing provably good attack algorithms for re-identification that we use to estimate the re-identification risk in the Topics API. We believe this work provides a rigorous and interpretable notion of re-identification risk and a framework to measure it that can be used to inform real-world applications.
PDF File
Version of Record for "Measuring Re-identification Risk" by Carey et al., Proceedings of the ACM on Management of Data, Volume 1, Issue 2 (PACMMOD 1:2).

References

[1]
Martin Abadi, Andy Chu, Ian Goodfellow, H Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. 2016. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC conference on computer and communications security. ACM, 308--318.
[2]
Gagan Aggarwal, Tomas Feder, Krishnaram Kenthapadi, Rajeev Motwani, Rina Panigrahy, Dilys Thomas, and An Zhu. 2005. Approximation Algorithms for k-Anonymity, In Proceedings of the International Conference on Database Theory (ICDT 2005). Journal of Privacy Technology (JOPT). http://ilpubs.stanford.edu:8090/645/
[3]
Rakesh Agrawal and Ramakrishnan Srikant. 2000. Privacy-preserving data mining. In Proceedings of the 2000 ACM SIGMOD international conference on Management of data. 439--450.
[4]
Má rio S. Alvim, Konstantinos Chatzikokolakis, Annabelle McIver, Carroll Morgan, Catuscia Palamidessi, and Geoffrey Smith. 2020. The Science of Quantitative Information Flow. Springer. https://doi.org/10.1007/978--3--319--96131--6
[5]
Ali Ismail Awad. 2012. Machine Learning Techniques for Fingerprint Identification: A Short Review. In Advanced Machine Learning Technologies and Applications - First International Conference, AMLTA 2012, Cairo, Egypt, December 8--10, 2012. Proceedings (Communications in Computer and Information Science, Vol. 322), Aboul Ella Hassanien, Abdel-Badeeh M. Salem, Rabie A. Ramadan, and Tai-Hoon Kim (Eds.). Springer, 524--531.
[6]
Dzmitry Bahdanau, Kyung Hyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In ICLR.
[7]
Muhammad Ahmad Bashir and Christo Wilson. 2018. Diffusion of User Tracking Data in the Online Advertising Ecosystem. Proc. Priv. Enhancing Technol., Vol. 2018, 4 (2018), 85--103.
[8]
Yoshua Bengio, Aaron C. Courville, and Pascal Vincent. 2013. Representation Learning: A Review and New Perspectives. IEEE Trans. Pattern Anal. Mach. Intell., Vol. 35, 8 (2013), 1798--1828.
[9]
Thierry Bertin-Mahieux, Daniel P.W. Ellis, Brian Whitman, and Paul Lamere. 2011. The Million Song Dataset. In 12th International Society for Music Information Retrieval Conference (ISMIR 2011).
[10]
Richard Blahut. 1974. Hypothesis testing and information theory. IEEE Transactions on Information Theory, Vol. 20, 4 (1974), 405--417.
[11]
Kamalika Chaudhuri, Claire Monteleoni, and Anand D Sarwate. 2011. Differentially private empirical risk minimization. Journal of Machine Learning Research, Vol. 12, 3 (2011).
[12]
Mia Xu Chen, Orhan Firat, Ankur Bapna, Melvin Johnson, Wolfgang Macherey, George Foster, Llion Jones, Mike Schuster, Noam Shazeer, Niki Parmar, Ashish Vaswani, Jakob Uszkoreit, Lukasz Kaiser, Zhifeng Chen, Yonghui Wu, and Macduff Hughes. 2018. The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Melbourne, Australia, 76--86. https://doi.org/10.18653/v1/P18--1008
[13]
Aloni Cohen and Kobbi Nissim. 2020. Towards formalizing the GDPR's notion of singling out. Proceedings of the National Academy of Sciences, Vol. 117, 15 (2020), 8344--8352.
[14]
Graham Cormode, Somesh Jha, Tejas Kulkarni, Ninghui Li, Divesh Srivastava, and Tianhao Wang. 2018. Privacy at scale: Local differential privacy in practice. In Proceedings of the 2018 International Conference on Management of Data. 1655--1658.
[15]
Thomas M. Cover and Joy A. Thomas. 2001. Elements of Information Theory. Wiley.
[16]
Mario Diaz, Hao Wang, Flavio P. Calmon, and Lalitha Sankar. 2020. On the Robustness of Information-Theoretic Privacy Measures and Mechanisms. IEEE Transactions on Information Theory, Vol. 66, 4 (2020), 1949--1978.
[17]
Cynthia Dwork. 2008. Differential Privacy: A Survey of Results. In Theory and Applications of Models of Computation, 5th International Conference, TAMC 2008, Xi'an, China, April 25--29, 2008. Proceedings (Lecture Notes in Computer Science, Vol. 4978), Manindra Agrawal, Ding-Zhu Du, Zhenhua Duan, and Angsheng Li (Eds.). Springer, 1--19.
[18]
Cynthia Dwork. 2019. Differential privacy and the US census. In Proceedings of the 38th ACM SIGMOD-SIGACT-SIGAI symposium on principles of database systems. 1--1.
[19]
Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. 2006. Calibrating noise to sensitivity in private data analysis. In Theory of cryptography conference. Springer, 265--284.
[20]
Peter Eckersley. 2010. How Unique is Your Web Browser?. In Proceedings of the 10th International Conference on Privacy Enhancing Technologies (Berlin, Germany). Springer-Verlag, Berlin, Heidelberg, 1--18.
[21]
Alessandro Epasto, André s Mu n oz Medina, Steven Avery, Yijian Bai, Ró bert Busa-Fekete, C. J. Carey, Ya Gao, David Guthrie, Subham Ghosh, James Ioannidis, Junyi Jiao, Jakub Lacki, Jason Lee, Arne Mauser, Brian Milch, Vahab S. Mirrokni, Deepak Ravichandran, Wei Shi, Max Spero, Yunting Sun, Umar Syed, Sergei Vassilvitskii, and Shuo Wang. 2021. Clustering for Private Interest-based Advertising. In KDD '21: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Virtual Event, Singapore, August 14--18, 2021, Feida Zhu, Beng Chin Ooi, and Chunyan Miao (Eds.). ACM, 2802--2810.
[22]
Alessandro Epasto, Andres Munoz Medina, Christina Ilvento, and Josh Karlin. 2022. Measures of Cross-Site Re-Identification Risk: an Analysis of the Topics API Proposal. https://github.com/patcg-individual-drafts/topics/blob/main/topics_analysis.pdf.
[23]
Alexandre Evfimievski, Johannes Gehrke, and Ramakrishnan Srikant. 2003. Limiting privacy breaches in privacy preserving data mining. In Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems. 211--222.
[24]
Alejandro Gó mez-Boix, Pierre Laperdrix, and Benoit Baudry. 2018. Hiding in the Crowd: an Analysis of the Effectiveness of Browser Fingerprinting at Large Scale. In Proceedings of the 2018 World Wide Web Conference on World Wide Web, WWW 2018, Lyon, France, April 23--27, 2019, Pierre-Antoine Champin, Fabien Gandon, Mounia Lalmas, and Panagiotis G. Ipeirotis (Eds.). ACM, 309--318.
[25]
Google. 2022a. The Privacy Sandbox. Available at https://privacysandbox.com/intl/en_us/ (Accessed October 15, 2022) (2022).
[26]
Google. 2022b. The Topics API. Available at https://github.com/patcg-individual-drafts/topics (Accessed October 15, 2022) (2022).
[27]
Jonathan Halcrow, Alexandru Mosoi, Sam Ruth, and Bryan Perozzi. 2020. Grale: Designing networks for graph learning. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2523--2532.
[28]
Hsiang Hsu, Natalia Martinez, Martin Bertran, Guillermo Sapiro, and Flavio P. Calmon. 2021. A Survey on Statistical, Information, and Estimation-Theoretic Views on Privacy. IEEE BITS the Information Theory Magazine, Vol. 1, 1 (2021), 45--56.
[29]
Shouling Ji, Prateek Mittal, and Raheem Beyah. 2016. Graph data anonymization, de-anonymization attacks, and de-anonymizability quantification: A survey. IEEE Communications Surveys & Tutorials, Vol. 19, 2 (2016), 1305--1326.
[30]
Batya Kenig and Tamir Tassa. 2012. A practical approximation algorithm for optimal k-anonymity. Data Mining and Knowledge Discovery, Vol. 25, 1 (2012), 134--168.
[31]
Pierre Laperdrix, Nataliia Bielova, Benoit Baudry, and Gildas Avoine. 2020. Browser Fingerprinting: A Survey. ACM Transactions on the Web, Vol. 14, 2 (2020).
[32]
Kristen LeFevre, David J DeWitt, and Raghu Ramakrishnan. 2006. Mondrian multidimensional k-anonymity. In 22nd International conference on data engineering (ICDE'06). IEEE, 25--25.
[33]
Ashwin Machanavajjhala, Xi He, and Michael Hay. 2017. Differential privacy in the wild: A tutorial on current practices & open challenges. In Proceedings of the 2017 ACM International Conference on Management of Data. 1727--1730.
[34]
Frank McSherry and Kunal Talwar. 2007. Mechanism design via differential privacy. In 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07). IEEE, 94--103.
[35]
Adam Meyerson and Ryan Williams. 2004. On the complexity of optimal k-anonymity. In Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems. 223--228.
[36]
?ukasz Olejnik, Gunes Acar, Claude Castelluccia, and Claudia Diaz. 2015. The leaking battery. In Data Privacy Management, and Security Assurance. Springer, 254--263.
[37]
Hyoungmin Park and Kyuseok Shim. 2010. Approximate algorithms with generalizing attribute values for k-anonymity. Information Systems, Vol. 35, 8 (2010), 933--955.
[38]
Amrita Roy Chowdhury, Chenghong Wang, Xi He, Ashwin Machanavajjhala, and Somesh Jha. 2020. Crypte: Crypto-assisted differential privacy on untrusted servers. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 603--619.
[39]
Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov. 2017. Membership inference attacks against machine learning models. In 2017 IEEE symposium on security and privacy (SP). IEEE, 3--18.
[40]
Geoffrey Smith. 2009. On the foundations of quantitative information flow. In International Conference on Foundations of Software Science and Computational Structures. Springer, 288--302.
[41]
Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to sequence learning with neural networks. In Advances in neural information processing systems. 3104--3112.
[42]
Latanya Sweeney. 2002. k-Anonymity: A Model for Protecting Privacy. Int. J. Uncertain. Fuzziness Knowl. Based Syst., Vol. 10, 5 (2002), 557--570.
[43]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, ?ukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates, Inc., 5998--6008. http://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf
[44]
Sergio Verdú. 2015. ?-mutual information. In 2015 Information Theory and Applications Workshop (ITA). 1--6. https://doi.org/10.1109/ITA.2015.7308959
[45]
Vassilios S Verykios, Elisa Bertino, Igor Nai Fovino, Loredana Parasiliti Provenza, Yucel Saygin, and Yannis Theodoridis. 2004. State-of-the-art in privacy preserving data mining. ACM Sigmod Record, Vol. 33, 1 (2004), 50--57.
[46]
Weina Wang, Lei Ying, and Junshan Zhang. 2016. On the Relation Between Identifiability, Differential Privacy, and Mutual-Information Privacy. IEEE Transactions on Information Theory, Vol. 62, 9 (2016), 5018--5029.
[47]
Genqiang Wu, Xianyao Xia, and Yeping He. 2017. Extending Differential Privacy for Treating Dependent Records via Information Theory. CoRR, Vol. abs/1703.07474 (2017). http://arxiv.org/abs/1703.07474
[48]
Mengmeng Yang, Lingjuan Lyu, Jun Zhao, Tianqing Zhu, and Kwok-Yan Lam. 2020. Local Differential Privacy and Its Applications: A Comprehensive Survey. CoRR, Vol. abs/2008.03686 (2020). https://arxiv.org/abs/2008.03686

Cited By

View all
  • (2024)A First View of Topics API Usage in the WildProceedings of the 20th International Conference on emerging Networking EXperiments and Technologies10.1145/3680121.3697810(48-54)Online publication date: 9-Dec-2024
  • (2024)Re-Identification Attacks against the Topics APIACM Transactions on the Web10.1145/367540018:3(1-24)Online publication date: 16-Aug-2024
  • (2024)Supply Chain Financing Risk Identification and Control under the Internet of Things Financial ModelProceedings of the 2024 3rd International Conference on Cyber Security, Artificial Intelligence and Digital Economy10.1145/3672919.3672926(34-38)Online publication date: 1-Mar-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Management of Data
Proceedings of the ACM on Management of Data  Volume 1, Issue 2
PACMMOD
June 2023
2310 pages
EISSN:2836-6573
DOI:10.1145/3605748
Issue’s Table of Contents
This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 June 2023
Published in PACMMOD Volume 1, Issue 2

Author Tags

  1. privacy
  2. re-identification risk
  3. user representations

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)317
  • Downloads (Last 6 weeks)64
Reflects downloads up to 10 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)A First View of Topics API Usage in the WildProceedings of the 20th International Conference on emerging Networking EXperiments and Technologies10.1145/3680121.3697810(48-54)Online publication date: 9-Dec-2024
  • (2024)Re-Identification Attacks against the Topics APIACM Transactions on the Web10.1145/367540018:3(1-24)Online publication date: 16-Aug-2024
  • (2024)Supply Chain Financing Risk Identification and Control under the Internet of Things Financial ModelProceedings of the 2024 3rd International Conference on Cyber Security, Artificial Intelligence and Digital Economy10.1145/3672919.3672926(34-38)Online publication date: 1-Mar-2024
  • (2024)The Privacy-Utility Trade-off in the Topics APIProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security10.1145/3658644.3670368(1106-1120)Online publication date: 2-Dec-2024
  • (2024)A Public and Reproducible Assessment of the Topics API on Real Data2024 IEEE Security and Privacy Workshops (SPW)10.1109/SPW63631.2024.00005(1-8)Online publication date: 23-May-2024
  • (2023)A Quantitative Information Flow Analysis of the Topics APIProceedings of the 22nd Workshop on Privacy in the Electronic Society10.1145/3603216.3624959(123-127)Online publication date: 26-Nov-2023

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media