[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3548606.3560694acmconferencesArticle/Chapter ViewAbstractPublication PagesccsConference Proceedingsconference-collections
research-article
Open access

Membership Inference Attacks and Generalization: A Causal Perspective

Published: 07 November 2022 Publication History

Abstract

Membership inference (MI) attacks highlight a privacy weakness in present stochastic training methods for neural networks. It is not well understood, however, why they arise. Are they a natural consequence of imperfect generalization only? Which underlying causes should we address during training to mitigate these attacks? Towards answering such questions, we propose the first approach to explain MI attacks and their connection to generalization based on principled causal reasoning. We offer causal graphs that quantitatively explain the observed MI attack performance achieved for 6 attack variants. We refute several prior non-quantitative hypotheses that over-simplify or over-estimate the influence of underlying causes, thereby failing to capture the complex interplay between several factors. Our causal models also show a new connection between generalization and MI attacks via their shared causal factors. Our causal models have high predictive power (0.90), i.e., their analytical predictions match with observations in unseen experiments often, which makes analysis via them a pragmatic alternative.

References

[1]
Teodora Baluta, Shiqi Shen, S Hitarth, Shruti Tople, and Prateek Saxena. 2022. Membership Inference Attacks and Generalization: A Causal Perspective. arXiv preprint arXiv:2209.08615 (2022).
[2]
Elias Bareinboim and Judea Pearl. 2012. Controlling selection bias in causal inference. In Artificial Intelligence and Statistics. PMLR, 100--108.
[3]
Joseph Berkson. 1946. Limitations of the application of fourfold table analysis to hospital data. Biometrics Bulletin, Vol. 2, 3 (1946), 47--53.
[4]
Henry E Brady. 2008. Causation and explanation in social science. na.
[5]
Zhihong Cai and Manabu Kuroki. 2008. On identifying total effects in the presence of latent variables and selection bias. In Proceedings of the Twenty-Fourth Conference on Uncertainty in Artificial Intelligence (UAI). 62--69.
[6]
Nicholas Carlini, Chang Liu, Úlfar Erlingsson, Jernej Kos, and Dawn Song. 2019. The secret sharer: Evaluating and testing unintended memorization in neural networks. In USENIX Security Symposium. 267--284.
[7]
Nicholas Carlini, Florian Tramer, Eric Wallace, Matthew Jagielski, Ariel Herbert-Voss, Katherine Lee, Adam Roberts, Tom Brown, Dawn Song, Ulfar Erlingsson, et al. 2020. Extracting training data from large language models. arXiv preprint arXiv:2012.07805 (2020).
[8]
David Chickering, Dan Geiger, and David Heckerman. 1995. Learning Bayesian networks: Search methods and experimental results. In proceedings of fifth conference on artificial intelligence and statistics. 112--128.
[9]
Christopher A Choquette-Choo, Florian Tramer, Nicholas Carlini, and Nicolas Papernot. 2021. Label-only membership inference attacks. In International Conference on Machine Learning (ICML). PMLR, 1964--1974.
[10]
Rónán Daly and Qiang Shen. 2007. Methods to accelerate the learning of bayesian network structures. In Proceedings of the 2007 UK Workshop on Computational Intelligence. Citeseer.
[11]
Gintare Karolina Dziugaite, Alexandre Drouin, Brady Neal, Nitarshan Rajkumar, Ethan Caballero, Linbo Wang, Ioannis Mitliagkas, and Daniel M Roy. 2020. In search of robust measures of generalization. Advances in Neural Information Processing Systems (NeurIPS), Vol. 33 (2020).
[12]
Nir Friedman, Moises Goldszmidt, and Abraham Wyner. 2013. Data analysis with Bayesian networks: A bootstrap approach. arXiv preprint arXiv:1301.6695 (2013).
[13]
Nir Friedman and Daphne Koller. 2003. Being Bayesian about network structure. A Bayesian approach to structure discovery in Bayesian networks. Machine learning, Vol. 50, 1 (2003), 95--125.
[14]
Stuart Geman, Elie Bienenstock, and René Doursat. 1992. Neural networks and the bias/variance dilemma. Neural computation, Vol. 4, 1 (1992), 1--58.
[15]
Clark Glymour, Kun Zhang, and Peter Spirtes. 2019. Review of causal discovery methods based on graphical models. Frontiers in genetics, Vol. 10 (2019), 524.
[16]
Jamie Hayes, Luca Melis, George Danezis, and Emiliano De Cristofaro. 2019. Logan: Membership inference attacks against generative models. In Proceedings on Privacy Enhancing Technologies (PoPETs), Vol. 2019. 133--152.
[17]
Xinlei He, Rui Wen, Yixin Wu, Michael Backes, Yun Shen, and Yang Zhang. 2021. Node-level membership inference attacks against graph neural networks. arXiv preprint arXiv:2102.05429 (2021).
[18]
James J Heckman. 2008. Econometric causality. International statistical review, Vol. 76, 1 (2008), 1--27.
[19]
Yimin Huang and Marco Valtorta. 2006. Pearl's calculus of intervention is complete. In Proceedings of the Twenty-Second Conference on Uncertainty in Artificial Intelligence (UAI). 217--224.
[20]
Bo Hui, Yuchen Yang, Haolin Yuan, Philippe Burlina, Neil Zhenqiang Gong, and Yinzhi Cao. 2021. Practical Blind Membership Inference Attack via Differential Comparisons. Network and Distributed Systems Security (NDSS) (2021).
[21]
Amin Jaber, Jiji Zhang, and Elias Bareinboim. 2018. Causal Identification under Markov Equivalence. In Proceedings of the 34th Conference on Uncertainty in Artificial Intelligence (UAI).
[22]
Jinyuan Jia, Ahmed Salem, Michael Backes, Yang Zhang, and Neil Zhenqiang Gong. 2019. Memguard: Defending against black-box membership inference attacks via adversarial examples. In ACM SIGSAC Conference on Computer and Communications Security (CCS). 259--274.
[23]
Yiding Jiang, Dilip Krishnan, Hossein Mobahi, and Samy Bengio. 2019. Predicting the Generalization Gap in Deep Networks with Margin Distributions. In International Conference on Learning Representations (ICLR).
[24]
Murat Kocaoglu, Amin Jaber, Karthikeyan Shanmugam, and Elias Bareinboim. 2019. Characterization and Learning of Causal Graphs with Latent Variables from Soft Interventions. In Advances in Neural Information Processing Systems (NeurIPS).
[25]
Ron Kohavi, David H Wolpert, et al. 1996. Bias plus variance decomposition for zero-one loss functions. In ICML, Vol. 96. 275--83.
[26]
Sanghack Lee and Elias Bareinboim. 2021. Causal identification with matrix equations. Advances in Neural Information Processing Systems (NeurIPS), Vol. 34 (2021).
[27]
Klas Leino and Matt Fredrikson. 2020. Stolen memories: Leveraging model memorization for calibrated white-box membership inference. In USENIX Security Symposium. 1605--1622.
[28]
Jiacheng Li, Ninghui Li, and Bruno Ribeiro. 2021. Membership Inference Attacks and Defenses in Classification Models. In ACM Conference on Data and Application Security and Privacy (CODASPY). 5--16.
[29]
Zheng Li and Yang Zhang. 2021. Membership Leakage in Label-Only Exposures. In ACM Conference on Computer and Communications Security (CCS).
[30]
Hongbin Liu, Jinyuan Jia, Wenjie Qu, and Neil Zhenqiang Gong. 2021a. EncoderMI: Membership inference against pre-trained encoders in contrastive learning. In ACM SIGSAC Conference on Computer and Communications Security (CCS). 2081--2095.
[31]
Yugeng Liu, Rui Wen, Xinlei He, Ahmed Salem, Zhikun Zhang, Michael Backes, Emiliano De Cristofaro, Mario Fritz, and Yang Zhang. 2021b. ML-Doctor: Holistic Risk Assessment of Inference Attacks Against Machine Learning Models. arXiv preprint arXiv:2102.02551 (2021).
[32]
Yunhui Long, Vincent Bindschaedler, Lei Wang, Diyue Bu, Xiaofeng Wang, Haixu Tang, Carl A Gunter, and Kai Chen. 2018. Understanding membership inferences on well-generalized learning models. arXiv preprint arXiv:1802.04889 (2018).
[33]
Margaret Mooney Marini and Burton Singer. 1988. Causality in the social sciences. Sociological methodology, Vol. 18 (1988), 347--409.
[34]
Fatemehsadat Mireshghallah, Kartik Goyal, Archit Uniyal, Taylor Berg-Kirkpatrick, and Reza Shokri. 2022. Quantifying Privacy Risks of Masked Language Models Using Membership Inference Attacks. arXiv preprint arXiv:2203.03929 (2022).
[35]
Sasi Kumar Murakonda and Reza Shokri. 2020. ML Privacy Meter: Aiding regulatory compliance by quantifying the privacy risks of machine learning. arXiv preprint arXiv:2007.09339 (2020).
[36]
Milad Nasr, Reza Shokri, and Amir Houmansadr. 2018. Machine learning with membership privacy using adversarial regularization. In ACM SIGSAC Conference on Computer and Communications Security (CCS). 634--646.
[37]
Milad Nasr, Reza Shokri, and Amir Houmansadr. 2019. Comprehensive privacy analysis of deep learning: Passive and active white-box inference attacks against centralized and federated learning. In IEEE Symposium on Security and Privacy (S&P). IEEE, 739--753.
[38]
Brady Neal, Chin-Wei Huang, and Sunand Raghupathi. 2020. Realcause: Realistic causal inference benchmarking. arXiv preprint arXiv:2011.15007 (2020).
[39]
Brady Neal, Sarthak Mittal, Aristide Baratin, Vinayak Tantia, Matthew Scicluna, Simon Lacoste-Julien, and Ioannis Mitliagkas. 2018. A modern take on the bias-variance tradeoff in neural networks. arXiv preprint arXiv:1810.08591 (2018).
[40]
Official Journal of the European Union. 2016. Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation)., Vol. L119 (2016), 1--88.
[41]
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems (NeurIPS), H. Wallach, H. Larochelle, A. Beygelzimer, F. dtextquotesingle Alché-Buc, E. Fox, and R. Garnett (Eds.). Curran Associates, Inc., 8024--8035. http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf
[42]
Judea Pearl. 2009. Causality. Cambridge university press.
[43]
Judea Pearl. 2012. On a class of bias-amplifying variables that endanger effect estimates. arXiv preprint arXiv:1203.3503 (2012).
[44]
Judea Pearl et al. 2000. Models, reasoning and inference. Cambridge, UK: CambridgeUniversityPress, Vol. 19 (2000), 2.
[45]
James M Robins, Andrea Rotnitzky, and Daniel O Scharfstein. 2000. Sensitivity analysis for selection bias and unmeasured confounding in missing data and causal inference models. In Statistical models in epidemiology, the environment, and clinical trials. Springer, 1--94.
[46]
Robert W Robinson. 1977. Counting unlabeled acyclic digraphs. In Combinatorial mathematics V. Springer, 28--43.
[47]
Paul R Rosenbaum. 2005. Sensitivity analysis in observational studies. Encyclopedia of statistics in behavioral science (2005).
[48]
Alexandre Sablayrolles, Matthijs Douze, Cordelia Schmid, Yann Ollivier, and Hervé Jégou. 2019. White-box vs black-box: Bayes optimal strategies for membership inference. In International Conference on Machine Learning (ICML). PMLR, 5558--5567.
[49]
Ahmed Salem, Yang Zhang, Mathias Humbert, Mario Fritz, and Michael Backes. 2019. ML-Leaks: Model and Data Independent Membership Inference Attacks and Defenses on Machine Learning Models. In Network and Distributed Systems Security Symposium (NDSS). Internet Society.
[50]
Gideon Schwarz. 1978. Estimating the dimension of a model. The annals of statistics (1978), 461--464.
[51]
Marco Scutari. 2010. Learning Bayesian Networks with the bnlearn R Package. Journal of Statistical Software, Vol. 35, 3 (2010), 1--22. https://doi.org/10.18637/jss.v035.i03
[52]
Marco Scutari and Radhakrishnan Nagarajan. 2011. On Identifying Significant Edges in Graphical Models of Molecular Networks. arXiv preprint arXiv:1104.0896 (2011).
[53]
Karthikeyan Shanmugam, Murat Kocaoglu, Alexandros G Dimakis, and Sriram Vishwanath. 2015. Learning Causal Graphs with Small Interventions. Advances in Neural Information Processing Systems (NeurIPS), Vol. 28 (2015), 3195--3203.
[54]
Amit Sharma and Emre Kiciman. 2020. DoWhy: An End-to-End Library for Causal Inference. arXiv preprint arXiv:2011.04216 (2020).
[55]
Amit Sharma, Emre Kiciman, et al. 2019. DoWhy: A Python package for causal inference. https://github.com/microsoft/dowhy.
[56]
Reza Shokri, Martin Strobel, and Yair Zick. 2021. On the privacy risks of model explanations. In AAAI/ACM Conference on AI, Ethics, and Society. 231--241.
[57]
Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov. 2017. Membership inference attacks against machine learning models. In IEEE Symposium on Security and Privacy (S&P). IEEE, 3--18.
[58]
Ilya Shpitser and Judea Pearl. 2006. Identification of joint interventional distributions in recursive semi-Markovian causal models. In Proceedings of the National Conference on Artificial Intelligence, Vol. 21. Menlo Park, CA; Cambridge, MA; London; AAAI Press; MIT Press; 1999, 1219.
[59]
Edward H Simpson. 1951. The interpretation of interaction in contingency tables. Journal of the Royal Statistical Society: Series B (Methodological), Vol. 13, 2 (1951), 238--241.
[60]
Liwei Song and Prateek Mittal. 2021. Systematic evaluation of privacy risks of machine learning models. In USENIX Security Symposium.
[61]
Liwei Song, Reza Shokri, and Prateek Mittal. 2019a. Membership inference attacks against adversarially robust deep learning models. In IEEE Security and Privacy Workshops (SPW). IEEE, 50--56.
[62]
Liwei Song, Reza Shokri, and Prateek Mittal. 2019b. Privacy risks of securing machine learning models against adversarial examples. In ACM SIGSAC Conference on Computer and Communications Security (CCS). 241--257.
[63]
Shruti Tople, Amit Sharma, and Aditya Nori. 2020. Alleviating privacy attacks via causal learning. In International Conference on Machine Learning (ICML). PMLR, 9537--9547.
[64]
Sofia Triantafillou, Vincenzo Lagani, Christina Heinze-Deml, Angelika Schmidt, Jesper Tegner, and Ioannis Tsamardinos. 2017. Predicting causal relationships from biological data: Applying automated causal discovery on mass cytometry data of human immune cells. Scientific reports, Vol. 7, 1 (2017), 1--11.
[65]
Stacey Truex, Ling Liu, Mehmet Emre Gursoy, Lei Yu, and Wenqi Wei. 2019. Demystifying membership inference attacks in machine learning as a service. IEEE Transactions on Services Computing (2019).
[66]
Michael Carl Tschantz, Shayak Sen, and Anupam Datta. 2020. Sok: Differential privacy as a causal property. In IEEE Symposium on Security and Privacy (S&P). IEEE, 354--371.
[67]
Huihai Wu and Xiaohui Liu. 2008. Dynamic bayesian networks modeling for inferring genetic regulatory networks by search strategy: Comparison between greedy hill climbing and mcmc methods. In Proc. of World Academy of Science, Engineering and Technology, Vol. 34. Citeseer, 224--234.
[68]
Zitong Yang, Yaodong Yu, Chong You, Jacob Steinhardt, and Yi Ma. 2020. Rethinking bias-variance trade-off for generalization of neural networks. In International Conference on Machine Learning (ICML). PMLR, 10767--10777.
[69]
Samuel Yeom, Irene Giacomelli, Matt Fredrikson, and Somesh Jha. 2018. Privacy risk in machine learning: Analyzing the connection to overfitting. In Computer Security Foundations Symposium (CSF).
[70]
Santiago Zanella-Béguelin, Lukas Wutschitz, Shruti Tople, Victor Rühle, Andrew Paverd, Olga Ohrimenko, Boris Köpf, and Marc Brockschmidt. 2020. Analyzing information leakage of updates to natural language models. In ACM SIGSAC Conference on Computer and Communications Security (CCS). 363--375.
[71]
Minxing Zhang, Zhaochun Ren, Zihan Wang, Pengjie Ren, Zhunmin Chen, Pengfei Hu, and Yang Zhang. 2021. Membership Inference Attacks Against Recommender Systems. In ACM SIGSAC Conference on Computer and Communications Security (CCS). 864--879. io

Cited By

View all
  • (2024)SoK: Unintended Interactions among Machine Learning Defenses and Risks2024 IEEE Symposium on Security and Privacy (SP)10.1109/SP54263.2024.00243(2996-3014)Online publication date: 19-May-2024
  • (2023)"Get in Researchers; We're Measuring Reproducibility": A Reproducibility Study of Machine Learning Papers in Tier 1 Security ConferencesProceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security10.1145/3576915.3623130(3433-3459)Online publication date: 15-Nov-2023
  • (2023)Membership Inference Attacks against GNN-based Hardware Trojan Detection2023 IEEE 22nd International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)10.1109/TrustCom60117.2023.00166(1222-1229)Online publication date: 1-Nov-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
CCS '22: Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security
November 2022
3598 pages
ISBN:9781450394505
DOI:10.1145/3548606
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 November 2022

Check for updates

Author Tags

  1. causal reasoning
  2. generalization
  3. membership inference attacks

Qualifiers

  • Research-article

Funding Sources

Conference

CCS '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,261 of 6,999 submissions, 18%

Upcoming Conference

CCS '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)477
  • Downloads (Last 6 weeks)64
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)SoK: Unintended Interactions among Machine Learning Defenses and Risks2024 IEEE Symposium on Security and Privacy (SP)10.1109/SP54263.2024.00243(2996-3014)Online publication date: 19-May-2024
  • (2023)"Get in Researchers; We're Measuring Reproducibility": A Reproducibility Study of Machine Learning Papers in Tier 1 Security ConferencesProceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security10.1145/3576915.3623130(3433-3459)Online publication date: 15-Nov-2023
  • (2023)Membership Inference Attacks against GNN-based Hardware Trojan Detection2023 IEEE 22nd International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)10.1109/TrustCom60117.2023.00166(1222-1229)Online publication date: 1-Nov-2023
  • (2023)Resisting Membership Inference Attacks by Dynamically Adjusting Loss Targets2023 International Conference on Networking and Network Applications (NaNA)10.1109/NaNA60121.2023.00100(574-579)Online publication date: Aug-2023
  • (2023)Inference Attack and Privacy Security of Data-driven Industrial Process Monitoring Systems2023 IEEE 12th Data Driven Control and Learning Systems Conference (DDCLS)10.1109/DDCLS58216.2023.10165830(1312-1319)Online publication date: 12-May-2023
  • (2023)Causality-Aided Trade-Off Analysis for Machine Learning Fairness2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE)10.1109/ASE56229.2023.00105(371-383)Online publication date: 11-Sep-2023

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media