[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.5555/3620237.3620507guideproceedingsArticle/Chapter ViewAbstractPublication PagessecConference Proceedingsconference-collections
research-article

HOLMES: efficient distribution testing for secure collaborative learning

Published: 09 August 2023 Publication History

Abstract

Using secure multiparty computation (MPC), organizations which own sensitive data (e.g., in healthcare, finance or law enforcement) can train machine learning models over their joint dataset without revealing their data to each other. At the same time, secure computation restricts operations on the joint dataset, which impedes computation to assess its quality. Without such an assessment, deploying a jointly trained model is potentially illegal. Regulations, such as the European Union's General Data Protection Regulation (GDPR), require organizations to be legally responsible for the errors, bias, or discrimination caused by their machine learning models. Hence, testing data quality emerges as an indispensable step in secure collaborative learning. However, performing distribution testing is prohibitively expensive using current techniques, as shown in our experiments.
We present HOLMES, a protocol for performing distribution testing efficiently. In our experiments, compared with three non-trivial baselines, HOLMES achieves a speedup of more than 10× for classical distribution tests and up to 104× for multidimensional tests. The core of HOLMES is a hybrid protocol that integrates MPC with zero-knowledge proofs and a new ZK-friendly and naturally oblivious sketching algorithm for multidimensional tests, both with significantly lower computational complexity and concrete execution costs.

References

[1]
Karen Hao. AI is sending people to jail--and getting it wrong. https://www.technologyreview.com/2019/01/21/137783/algorithms-criminal-justice-ai/.
[2]
Ganes Kesari. AI Can Now Detect Depression From Your Voice, And It's Twice As Accurate As Human Practitioners. https://bit.ly/32HdcUQ.
[3]
Ellipsis Health. https://www.ellipsishealth.com/.
[4]
Qbtech. https://www.qbtech.com/.
[5]
McKinsey Global Institute. Tackling bias in artificial intelligence (and in humans). https://mck.co/3Ge65B8.
[6]
Zhe Yu, Joymallya Chakraborty, and Tim Menzies. "FairBalance: Improving Machine Learning Fairness on Multiple Sensitive Attributes With Data Balancing". In: Arxiv:2107.08310.
[7]
Angelina Wang, Arvind Narayanan, and Olga Russakovsky. "REVISE: A tool for measuring and mitigating bias in visual datasets". In: ECCV '20.
[8]
Faisal Kamiran and Toon Calders. "Data preprocessing techniques for classification without discrimination". In: Knowledge and Information Systems 33.1 (2012), pp. 1-33.
[9]
Thomas Davidson, Debasmita Bhattacharya, and Ingmar Weber. "Racial Bias in Hate Speech and Abusive Language Detection Datasets". In: Workshop on Abusive Language Online '19.
[10]
Nature Communications Editorial. "Data sharing and the future of science". In: Nature Communications '18.
[11]
Heather A. Piwowar and Todd J. Vision. "Data reuse and the open data citation advantage". In: PeerJ '13.
[12]
Milton Packer. "Data sharing in medical research". In: British Medical Journal '18.
[13]
Liina Kamm, Dan Bogdanov, Sven Laur, and Jaak Vilo. "A new way to protect privacy in large-scale genome-wide association studies". In: Bioinformatics '13.
[14]
Emmanuel A Abbe, Amir E Khandani, and AndrewW Lo. "Privacy-preserving methods for sharing financial risk exposures". In: American Economic Review '12.
[15]
General Data Protection Regulation. https://gdpr-info.eu/.
[16]
Rights related to automated decision making including profiling. https://bit.ly/3ALUJ6m.
[17]
Assessing Data Quality for Healthcare Systems Data Used in Clinical Research. https://dcricollab.dcri.duke.edu/sites/NIHKR/KR/Assessing-data-quality_V1%200.pdf.
[18]
Rishabh Poddar, Sukrit Kalra, Avishay Yanai, Ryan Deng, Raluca Ada Popa, and Joseph M. Hellerstein. "Senate: A Maliciously-Secure MPC Platform for Collaborative Analytics". In: SEC '20.
[19]
Wenting Zheng, Raluca Ada Popa, Joseph E. Gonzalez, and Ion Stoica. "Helen: Maliciously Secure Coopetitive Learning for Linear Models". In: S&P '19.
[20]
Sameer Wagh, Shruti Tople, Fabrice Benhamouda, Eyal Kushilevitz, Prateek Mittal, and Tal Rabin. "Falcon: Honest-Majority Maliciously Secure Framework for Private Deep Learning". In: PETS '21.
[21]
Sameer Wagh, Divya Gupta, and Nishanth Chandran. "SecureNN: 3-Party Secure Computation for Neural Network Training". In: PETS '19.
[22]
Multi-Protocol SPDZ. https://github.com/data61/MP-SPDZ.
[23]
SCALE and MAMBA. https://github.com/KULeuven-COSIC/SCALE-MAMBA.
[24]
Kang Yang, Pratik Sarkar, Chenkai Weng, and Xiao Wang. "QuickSilver: Efficient and Affordable Zero-Knowledge Proofs for Circuits and Polynomials over Any Field". In: CCS '21.
[25]
Srinath Setty. "Spartan: Efficient and General-Purpose zkSNARKs Without Trusted Setup". In: CRYPTO '20.
[26]
Oded Goldreich. "Towards a theory of software protection". In: CRYPTO '86.
[27]
Oded Goldreich. "Towards a theory of software protection and simulation by oblivious RAMs". In: STOC '87.
[28]
Oded Goldreich and Rafail Ostrovsky. "Software protection and simulation on oblivious RAMs". In: JACM '96.
[29]
Benny Pinkas and Tzachy Reinman. "Oblivious RAM revisited". In: CRYPTO '10.
[30]
Elaine Shi, T-H Hubert Chan, Emil Stefanov, and Mingfei Li. "Oblivious RAM with O (logN3) worstcase cost". In: ASIACRYPT '11.
[31]
Emil Stefanov, Elaine Shi, and Dawn Song. "Towards practical oblivious RAM". In: NDSS '12.
[32]
Emil Stefanov, Marten Van Dijk, Elaine Shi, Christopher Fletcher, Ling Ren, Xiangyao Yu, and Srinivas Devadas. "Path ORAM: An extremely simple oblivious RAM protocol". In: CCS '13.
[33]
Craig Gentry, Kenny A Goldman, Shai Halevi, Charanjit Julta, Mariana Raykova, and Daniel Wichs. "Optimizing ORAM and using it efficiently for secure computation". In: PETS '13.
[34]
William B Johnson and Joram Lindenstrauss. "Extensions of Lipschitz mappings into a Hilbert space". In: Contemporary mathematics '84.
[35]
Lorenzo Grassi, Dmitry Khovratovich, Christian Rechberger, Arnab Roy, and Markus Schofnegger. "POSEIDON: A New Hash Function for Zero-Knowledge Proof System". In: SEC '21.
[36]
Abdelrahaman Aly, Tomer Ashur, Eli Ben-Sasson, Siemen Dhooghe, and Alan Szepieniec. "Design of Symmetric-Key Primitives for Advanced Cryptographic Protocols". In: FSE '20.
[37]
Ivan Bjerre Damgård. "On the randomness of Legendre and Jacobi sequences". In: CRYPTO '88.
[38]
Lorenzo Grassi, Christian Rechberger, Dragos Rotaru, Peter Scholl, and Nigel P. Smart. "MPC-friendly symmetric key primitives". In: CCS '16.
[39]
Dmitry Khovratovich. "Key recovery attacks on the Legendre PRFs within the birthday bound". In: IACR ePrint 2019/862.
[40]
Alexander May and Floyd Zweydinger. "Legendre PRF (Multiple) Key Attacks and the Power of Preprocessing". In: IACR ePrint 2021/645.
[41]
Mark N. Wegman and J. Lawrence Carter. "New hash functions and their use in authentication and set equality". In: JCSS '81.
[42]
Moni Naor and Moti Yung. "Universal one-way hash functions and their cryptographic applications". In: STOC '89.
[43]
Sérgio Moro, Paulo Cortez, and Paulo Ritaa. "A data-driven approach to predict the success of bank telemarketing". In: Decision Support Systems '14.
[44]
Bank Marketing Data Set. https://archive.ics.uci.edu/ml/datasets/Bank+Marketing.
[45]
Beata Strack, Jonathan P. DeShazo, Chris Gennings, Juan L. Olmo, Sebastian Ventura, Krzysztof J. Cios, and John N. Clore. "Impact of HbA1c measurement on hospital readmission rates: Analysis of 70,000 clinical database patient records". In: BioMed Research International '14.
[46]
Diabetes 130-US hospitals for years 1999-2008 Data Set. https://archive.ics.uci.edu/ml/datasets/Diabetes+130-US+hospitals+for+years+1999-2008.
[47]
Real Time Advertiser's Auction. https://www.kaggle.com/saurav9786/real-time-advertisers-auction.
[48]
Ran Canetti, Yehuda Lindell, Rafail Ostrovsky, and Amit Sahai. "Universally composable two-party and multi-party secure computation". In: STOC '02.
[49]
Oded Goldreich, Silvio Micali, and Avi Wigderson. "How to Play any Mental Game or A Completeness Theorem for Protocols with Honest Majority". In: STOC '87.
[50]
Yehuda Lindel. "How to simulate it: A tutorial on the simulation proof technique". In: Tutorials on the Foundations of Cryptography: Dedicated to Oded Goldreich. 2017.
[51]
Ian Chang, Katerina Sotiraki, Weikeng Chen, Murat Kantarcioglu, and Raluca Ada Popa. HOLMES: Efficient Distribution Testing for Secure Collaborative Learning. Cryptology ePrint Archive, Paper 2021/1517. https://eprint.iacr.org/2021/1517. 2021.
[52]
Manuel Blum. "Coin flipping by telephone a protocol for solving impossible problems". In: ACM SIGACT News '83.
[53]
Carl Friedrich Gauss. "Theoria combinationis obsevationum erroribus minimis obnoxiae". In: Carl Friedrich Gauss Werke. 1823.
[54]
Ery Arias-Castro, Bruno Pelletier, and Venkatesh Saligrama. "Remember the curse of dimensionality: the case of goodness-of-fit testing in arbitrary dimension". In: Journal of Nonparametric Statistics '18.
[55]
Robert B Davies. "Algorithm AS 155: The distribution of a linear combination of χ2 random variables". In: Applied Statistics (1980), pp. 323-333.
[56]
J Sheil and I O'Muircheartaigh. "Algorithm AS 106: The distribution of non-negative quadratic forms in normal variables". In: Journal of the Royal Statistical Society. Series C (Applied Statistics) 26.1 (1977), pp. 92-98.
[57]
Jean-Pierre Imhof. "Computing the distribution of quadratic forms in normal variables". In: Biometrika 48.3/4 (1961), pp. 419-426.
[58]
Dimitris Achlioptas. "Database-Friendly Random Projections: Johnson-Lindenstrauss with Binary Coins". In: Journal of Computer and System Sciences '03.
[59]
Geoffroy Couteau, Peter Rindal, and Srinivasan Raghuraman. "Silver: Silent VOLE and Oblivious Transfer from Hardness of Decoding Structured LDPC Codes". In: CRYPTO '21.
[60]
Marcel Keller. "MP-SPDZ: A Versatile Framework for Multi-Party Computation". In: CCS '20.
[61]
Michael Ben-Or, Shafi Goldwasser, and Avi Wigderson. "Completeness Theorems for Non-Cryptographic Fault-Tolerant Distributed Computation". In: STOC '88.
[62]
Andrew Chi-Chih Yao. "Protocols for Secure Computations". In: FOCS '82.
[63]
Ivan Damgård, Valerio Pastro, Nigel P. Smart, and Sarah Zakarias. "Multiparty Computation from Somewhat Homomorphic Encryption". In: CRYPTO '12.
[64]
Ivan Damgård, Marcel Keller, Enrique Larraia, Valerio Pastro, Peter Scholl, and Nigel P. Smart. "Practical Covertly Secure MPC for Dishonest Majority - Or: Breaking the SPDZ Limits". In: ESORICS '13.
[65]
Marcel Keller, Valerio Pastro, and Dragos Rotaru. "Overdrive: Making SPDZ Great Again". In: EUROCRYPT '18.
[66]
Xiao Wang, Samuel Ranellucci, and Jonathan Katz. "Global-Scale Secure Multiparty Computation". In: CCS '17.
[67]
Carmit Hazay, Peter Scholl, and Eduardo Soria-Vazquez. "Low Cost Constant Round MPC Combining BMR and Oblivious Transfer". In: ASIACRYPT '17.
[68]
Kang Yang, Xiao Wang, and Jiang Zhang. "More Efficient MPC from Improved Triple Generation and Authenticated Garbling". In: CCS '20.
[69]
Kang Yang, Chenkai Weng, Xiao Lan, Jiang Zhang, and Xiao Wang. "Ferret: Fast Extension for Correlated OT with small communication". In: CCS '20.
[70]
Shafi Goldwasser, Silvio Micali, and Charles Rack-off. "The Knowledge Complexity of Interactive Proof-Systems". In: STOC '85.
[71]
Jonathan Bootle, Andrea Cerulli, Pyrros Chaidos, Jens Groth, and Christophe Petit. "Efficient Zero-Knowledge Arguments for Arithmetic Circuits in the Discrete Log Setting". In: EUROCRYPT '16.
[72]
Benedikt Bünz, Jonathan Bootle, Dan Boneh, Andrew Poelstra, Pieter Wuille, and Gregory Maxwell. "Bulletproofs: Short Proofs for Confidential Transactions and More". In: S&P '18.
[73]
Tiancheng Xie, Jiaheng Zhang, Yupeng Zhang, Charalampos Papamanthou, and Dawn Song. "Libra: Succinct Zero-Knowledge Proofs with Optimal Prover Computation". In: CRYPTO '19.
[74]
Jiaheng Zhang, Tiancheng Xie, Yupeng Zhang, and Dawn Song. "Transparent Polynomial Delegation and Its Applications to Zero Knowledge Proof". In: S&P '20.
[75]
Chenkai Weng, Kang Yang, Xiang Xie, Jonathan Katz, and Xiao Wang. "Mystique: Efficient Conversions for Zero-Knowledge Proofs with Applications to Machine Learning". In: SEC '21.
[76]
Chenkai Weng, Kang Yang, Xiao Wang, and Jonathan Katz. "Wolverine: Fast, Scalable, and Communication-Efficient Zero-Knowledge Proofs for Boolean and Arithmetic Circuits". In: S&P '21.
[77]
Samuel Dittmer, Yuval Ishai, and Rafail Ostrovsky. "Line-Point Zero Knowledge and Its Applications". In: ITC '21.
[78]
Carsten Baum, Alex J. Malozemoff, Marc B. Rosen, and Peter Scholl. "Mac'n'Cheese: Zero-Knowledge Proofs for Boolean and Arithmetic Circuits with Nested Disjunctions". In: CRYPTO '21.
[79]
Elette Boyle, Geoffroy Couteau, Niv Gilboa, and Yuval Ishai. "Compressing Vector OLE". In: CCS '18.
[80]
Elette Boyle, Geoffroy Couteau, Niv Gilboa, Yuval Ishai, Lisa Kohl, and Peter Scholl. "Efficient Pseudo-random Correlation Generators: Silent OT Extension and More". In: CRYPTO '19.
[81]
Samuel Dittmer, Yuval Ishai, and Rafail Ostrovsky. "Line-Point Zero Knowledge and Its Applications". In: ITC '21.
[82]
Samuel Dittmer, Yuval Ishai, Steve Lu, and Rafail Ostrovsky. Improving Line-Point Zero Knowledge: Two Multiplications for the Price of One. Cryptology ePrint Archive, Paper 2022/552. https://eprint.iacr.org/2022/552. 2022. URL: https://eprint.iacr.org/2022/552.
[83]
Alexandr Andoni, Tal Malkin, and Negev Shekel Nosatzki. "Two party distribution testing: Communication and security". In: arXiv preprint arXiv:1811.04065 (2018).
[84]
Varun Narayanan, Manoj Mishra, and Vinod M Prabhakaran. "Private two-terminal hypothesis testing". In: ISIT '20.
[85]
Henry Corrigan-Gibbs and Dan Boneh. "Prio: Private, robust, and scalable computation of aggregate statistics". In: NSDI '17.
[86]
Surya Addanki, Kevin Garbe, Eli Jaffe, Rafail Ostrovsky, and Antigoni Polychroniadou. "Prio+: Privacy Preserving Aggregate Statistics via Boolean Shares". In: IACR ePrint 2021/576.
[87]
Payman Mohassel and Yupeng Zhang. "SecureML: A system for scalable privacy-preserving machine learning". In: S&P '17.
[88]
Melissa Chase, Ran Gilad-Bachrach, Kim Laine, Kristin E. Lauter, and Peter Rindal. "Private Collaborative Neural Network Learning". In: IACR ePrint 2017/762.
[89]
Irene Giacomelli, Somesh Jha, Marc Joye, C David Page, and Kyonghwan Yoon. "Privacy-preserving ridge regression with only linearly-homomorphic encryption". In: ACNS '18.
[90]
Jian Liu, Mika Juuti, Yao Lu, and Nadarajah Asokan. "Oblivious neural network predictions via MiniONN transformations". In: CCS '17.
[91]
Chiraag Juvekar, Vinod Vaikuntanathan, and Anantha Chandrakasan. "GAZELLE: A low latency framework for secure neural network inference". In: SEC '18.
[92]
M Sadegh Riazi, Mohammad Samragh, Hao Chen, Kim Laine, Kristin Lauter, and Farinaz Koushanfar. "XONN: XNOR-based oblivious deep neural network inference". In: SEC '19.
[93]
Valerie Chen, Valerio Pastro, and Mariana Raykova. "Secure Computation for Machine Learning With SPDZ". In: NeurIPS '18.
[94]
Anselme Tueno, Florian Kerschbaum, and Stefan Katzenbeisser. "Private Evaluation of Decision Trees using Sublinear Cost". In: PETS '19.
[95]
Valeria Nikolaenko, Udi Weinsberg, Stratis Ioannidis, Marc Joye, Dan Boneh, and Nina Taft. "Privacy-preserving ridge regression on hundreds of millions of records". In: S&P '13.
[96]
Adrià Gascón, Phillipp Schoppmann, Borja Balle, Mariana Raykova, Jack Doerner, Samee Zahur, and David Evans. "Privacy-Preserving Distributed Linear Regression on High-Dimensional Data." In: PETS '17.
[97]
Hao Chen, Miran Kim, Ilya Razenshteyn, Dragos Rotaru, Yongsoo Song, and Sameer Wagh. "Maliciously secure matrix multiplication with applications to private deep learning". In: ASIACRYPT '20.
[98]
Wenting Zheng, Ryan Deng, Weikeng Chen, Raluca Ada Popa, Aurojit Panda, and Ion Stoica. "Cerebro: A Platform for Multi-Party Cryptographic Collaborative Learning". In: SEC '21.
[99]
Christopher A. Choquette-Choo, Natalie Dullerud, Adam Dziedzic, Yunxiang Zhang, Somesh Jha, Nicolas Papernot, and Xiao Wang. "CaPC Learning: Confidential and Private Collaborative Learning". In: Arxiv:2102.05188. 2021.
[100]
Mark Abspoel, Daniel Escudero, and Nikolaj Volgushev. "Secure training of decision trees with continuous attributes". In: PETS '21.
[101]
Frank R Hampel, Elvezio M Ronchetti, Peter J Rousseeuw, and Werner A Stahel. Robust statistics: The approach based on influence functions. Vol. 196. John Wiley & Sons, 2011.
[102]
Peter J Huber. Robust statistics. Vol. 523. John Wiley & Sons, 2004.
[103]
Ricardo A Maronna, R Douglas Martin, Victor J Yohai, and Matías Salibián-Barrera. Robust statistics: Theory and methods (with R). John Wiley & Sons, 2019.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
SEC '23: Proceedings of the 32nd USENIX Conference on Security Symposium
August 2023
7552 pages
ISBN:978-1-939133-37-3

Sponsors

  • Meta
  • Google Inc.
  • NSF
  • IBM
  • Futurewei Technologies

Publisher

USENIX Association

United States

Publication History

Published: 09 August 2023

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Acceptance Rates

Overall Acceptance Rate 40 of 100 submissions, 40%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Dec 2024

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media